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Preface 


It was our privilege to serve as the program chairs for CAV 2021, the 33rd International 
Conference on Computer-Aided Verification. CAV 2021 was held as a virtual con- 
ference during July 20-23, 2021. The tutorial days were on July 19 and July 24, 2021, 
and the pre-conference workshops were held during July 18-19, 2021. Due to the 
COVID-19 outbreak, all events took place online. 

CAV is an annual conference dedicated to the advancement of the theory and 
practice of computer-aided formal analysis methods for hardware and software sys- 
tems. The primary focus of CAV is to extend the frontiers of verification techniques by 
expanding to new domains such as security, quantum computing, and machine 
learning. This puts CAV at the cutting edge of formal methods research, and this year’s 
program is a reflection of this commitment. 

CAV 2021 received a very high number of submissions (290). We accepted 16 tool 
papers, 3 case studies, and 60 regular papers, which amounts to an acceptance rate of 
roughly 27%. The accepted papers cover a wide spectrum of topics, from theoretical 
results to applications of formal methods. These papers apply or extend formal methods 
to a wide range of domains such as concurrency, machine learning, and industrially 
deployed systems. The program featured keynote talks by Loris D’Antoni 
(UW-Madison), Corina Pasareanu (NASA), and Anna Slobodova (Centaur Technol- 
ogy, Inc.) as well as invited tutorials by Nate Foster (Cornell University), Zak Kincaid 
(Princeton) together with Tom Reps (UW-Madison), and Nadia Polikarpova (UC San 
Diego). Furthermore, we continued the tradition of Logic Lounge, a series of discus- 
sions on computer science topics targeting a general audience. 

In addition to the main conference, CAV 2021 hosted the following workshops: 
Formal Approaches to Certifying Compliance (FACC), Formal Methods for 
ML-Enabled Autonomous Systems (FoMLAS), Formal Methods for Blockchains 
(FMBC), Numerical Software Verification (NSV), Theory and Practice of String 
Solving (TPSS), Verifying Probabilistic Programs (VeriProP), Synthesis (SYNT), 
Satisfiability Modulo Theories (SMT), and Verification Mentoring Workshop (VMW). 

Organizing a flagship conference like CAV requires a great deal of effort from the 
community. The Program Committee for CAV 2021 consisted of 79 members — a 
committee of this size ensures that each member has to review only a reasonable 
number of papers in the allotted time. In all, the committee members wrote over 900 
reviews while investing significant effort to maintain and ensure the high quality of the 
conference program. We are grateful to the CAV 2021 Program Committee for their 
outstanding efforts in evaluating the submissions and making sure that each paper got a 
fair chance. Like last year’s CAV, we made the artifact evaluation mandatory for tool 
paper submissions and optional, but encouraged, for the rest of the accepted papers. 
This year saw an unprecedented number of 66 artifact submissions. The Artifact 
Evaluation Committee consisted of 72 members who put in significant effort to eval- 
uate each artifact. The goal of this process was to provide constructive feedback to tool 


vi Preface 


developers and help make the research published in CAV more reproducible. We are 
also very grateful to the Artifact Evaluation Committee for their hard work and ded- 
ication in evaluating the submitted artifacts. 

CAV 2021 would not have been possible without the tremendous help we received 
from several individuals, and we would like to thank everyone who helped make CAV 
2021 a success. First, we would like to thank Clément Pit-Claudel and Maria Schett for 
chairing the Artifact Evaluation Committee and John Cyphert for putting together the 
proceedings. We also thank Arie Gurfinkel for chairing the workshop organization, 
Bor-Yuh Evan Chang for managing sponsorship, Thomas Wies for arranging student 
fellowships, Norine Coenen for handling publicity, Leopold Haller for organising the 
Logic Lounge, and Peter Miiller for putting together the Ask me Anything program. We 
also thank Jean-Baptiste Jeannin and Arjun Radhakrishna for chairing the Mentoring 
Committee. Putting together an online conference is a complex task and we are grateful 
to the virtualization chair Tiago Ferreira, the student volunteer coordinators Tobias 
Kappé and Tao Gu, the local organizers for the Asia timezone, Ichiro Hasuo and 
Krishna S, and the team at Slides Live for all their efforts. Last but not least, we would 
like to thank the members of the CAV Steering Committee (Kenneth McMillan, Aarti 
Gupta, Orna Grumberg, and Daniel Kroening) for helping us with several important 
aspects of organizing CAV 2021. 

We hope that you will find the proceedings of CAV 2021 scientifically interesting 
and thought-provoking! 
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Abstract. We present the first machine learning approach to the ter- 
mination analysis of probabilistic programs. Ranking supermartingales 
(RSMs) prove that probabilistic programs halt, in expectation, within 
a finite number of steps. While previously RSMs were directly synthe- 
sised from source code, our method learns them from sampled execution 
traces. We introduce the neural ranking supermartingale: we let a neu- 
ral network fit an RSM over execution traces and then we verify it over 
the source code using satisfiability modulo theories (SMT); if the latter 
step produces a counterexample, we generate from it new sample traces 
and repeat learning in a counterexample-guided inductive synthesis loop, 
until the SMT solver confirms the validity of the RSM. The result is thus 
a sound witness of probabilistic termination. Our learning strategy is 
agnostic to the source code and its verification counterpart supports the 
widest range of probabilistic single-loop programs that any existing tool 
can handle to date. We demonstrate the efficacy of our method over a 
range of benchmarks that include linear and polynomial programs with 
discrete, continuous, state-dependent, multi-variate, hierarchical distri- 
butions, and distributions with undefined moments. 


1 Introduction 


Probabilistic programs are programs whose execution is affected by random vari- 
ables [17,19,23,29,36]. Randomness in programs may emerge from numerous 
sources, such as uncertain external inputs, hardware random number generators, 
or the (probabilistic) abstraction of pseudo-random generators, and is intrinsic 
in quantum programs [34]. Notable exemplars are randomised algorithms, cryp- 
tographic protocols, simulations of stochastic processes, and Bayesian inference 
[7,33]. Verification questions for probabilistic programs require reasoning about 
the probabilistic nature of their executions in order to appropriately characterise 
properties of interest. For instance, consider the following question, correspond- 
ing to the program in Fig. 1: will an ambitious marble collector eventually gather 
any arbitrarily large amounts of red and blue marbles? Intuitively, the question 
has an affirmative answer regardless of the initially established target amounts, 
since there is always a chance of collecting a marble of either color. Notice that, 
if the probabilistic choice is replaced with non-determinism, as often happens 
in software verification, an adversary may exclusively draw one color of marble 
© The Author(s) 2021 
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and make the program run forever. The question that matches the original intu- 
ition is whether the expected number of steps to termination is finite; this is the 
positive almost-sure termination (PAST) question [8, 10,13, 19,27]. 


1 while (red > O [| blue > 0) do 
2 p ~ Bernoulli(.01); 

3 if p == 1 then 

4 red = red - 1 

5 else 

6 blue = blue - 1 

T fi 

8 od 


Fig. 1. The ambitious marble collector (the variables red and blue are initialised non- 
deterministically). 


Probabilistic termination analysis is typically mechanised through the auto- 
mated synthesis of ranking supermartingales (RSMs), which are functions of the 
program variables whose value (i) decreases in expectation by a discrete amount 
across every loop iteration and (ii) is always bounded from below; an RSM 
formally witnesses that a program is PAST [10,13]. Early techniques for discov- 
ering RSMs reduced the synthesis problem from the source code of the program 
into constraint solving [10]. These methods have lent themselves to various gen- 
eralisations, including polynomial programs, programs with non-determinism, 
lexicographic and modular termination arguments, and persistence properties 
[2,14-16,20,25]. Recently, for special classes of probabilistic programs or term 
rewriting systems, novel automated proof techniques that leverage computer 
algebra systems and satisfiability modulo theories (SMT) have been introduced 
[5,6,38,39,41]. All the above methods are sound and, under specific assumptions, 
complete; they represent the state of the art for the class of programs they have 
been designed for. However, their assumptions are often too restrictive for the 
analysis of many simple programs. In particular, to the best of our knowledge, 
none can identify an RSM for the program in Fig. 1. For this simple program, it 
is easy to argue that the expected output of the neural network depicted in Fig. 2 
decreases after every iteration of the loop and that it is always non-negative (see 
Ex. 1). As such, this neural network is an appropriate RSM for the program. 


ReLU 
red 1 


1 ReLU 1 
blue 


Fig. 2. A neural ranking supermartingale for the program in Fig. 1. 
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We present a novel method for discovering RSMs using machine learning 
together with SMT solving. We introduce the neural ranking supermartingale 
(NRSM) model, which lets a neural network mimic a supermartingale over sam- 
pled execution traces from a program. We train an NRSM using standard optimi- 
sation algorithms over a loss function that makes the neural network decrease— 
in average—across sampled iterations. We phrase the certification problem into 
that of computing a counterexample for the NRSM. To do so, we encode the 
neural network together with the expected value of the program variables; then, 
we use an SMT solver for verifying that the expected output of the network 
decreases along every execution. If the solver falsifies the NRSM, then it pro- 
vides a counterexample that we use to guide a resampling of the execution 
traces; with this new data we retrain the neural network and repeat verifica- 
tion in a counterexample-guided inductive synthesis (CEGIS) fashion, until the 
SMT solver determines that no counterexample exists [4,44]. In the latter case, 
the solver has certified the generated NRSM; our method thus produces a sound 
PAST proof or runs indefinitely. Our procedure does not return for programs that 
are not PAST and may, in general, not return for some PAST instances. How- 
ever, we experimentally demonstrate that, in practice, our method succeeds over 
a broad range of PAST benchmarks within a few CEGIS iterations. Previously, 
machine learning has been applied to the termination analysis of deterministic 
programs and to the stability analysis of dynamical systems [1, 12,21, 24,28,30— 
32,42, 43,45]; our method is the first machine learning approach for probabilistic 
termination analysis. 

Our approach builds upon two key observations. First, the average of expres- 
sions along execution traces statistically approximates their true expected value. 
Thanks to this, we obtain a machine learning model for guessing RSM candidates 
that only requires execution traces and is thus agnostic to the source code. Sec- 
ond, solving the problem of checking an RSM is simpler than solving the entire 
termination analysis problem. Reasoning about source code is entirely delegated 
to the checking phase which, as such, supports programs that are out of reach 
to the available probabilistic termination analysers. 

We experimentally demonstrate that our method is effective over many pro- 
grams with linear and polynomial expressions, with both discrete and continuous 
distributions. This includes joint distributions, state-dependent distributions, 
distributions whose parameters are in turn random (hierarchical models), and 
distributions with undefined moments (e.g., the Cauchy distribution). We com- 
pare our method with a tool based on Farkas’ lemma and with the tools AMBER 
and ABSYNTH [2,39,41]; whilst our software prototype is slower than these alter- 
natives, it covers the widest range of benchmark single-loop programs. 

Summarising, our contribution is fivefold. First, we present the first machine 
learning method for the termination analysis of probabilistic programs. Second, 
we introduce a loss function for training neural networks to behave as ranking 
supermartingales over execution traces. Third, we show an approach to verify 
the validity of ranking supermartingales using SMT solving, which applies to 
a wide variety of single-loop probabilistic programs. Fourth, we experimentally 
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demonstrate over multiple baselines and newly-defined benchmarks the practical 
efficacy of our method. Fifth, we built a software prototype for evaluating our 
method. 


x € Vars (variables 
NER (numerals 
Ops r= +|- |*|&&| Ii |<] <=]s=]... (binary operators 
E ::= x | N | E op, E| -E (arithmetic expressions 
D ::= Bernoulli( E ) | Gaussian( E, E ) |... (probability distributions 
B ::= B op, B |! B | E op, E | true | false (Boolean expressions 
C ::= skip (commands 
|x= E (deterministic assignment 

|£ ~ D (probabilistic assignment 
|C;C (sequential composition 

| if B then C else C fi (conditional composition 


Fig. 3. Syntax of loop-free probabilistic programs. 


2 Termination Analysis of Probabilistic Programs 


We treat the termination analysis of single-loop probabilistic programs. We con- 
sider an imperative language that includes C-like arithmetic and Boolean expres- 
sions, and sequential and conditional composition of commands [13, 17,19, 23]. 


Syntax. A grammar for this language is shown in Fig. 3. We analyse single-loop 
programs of the form 
while G do 
U 
od 


where the loop guard G is a Boolean expression and the update statement U is 
a command. Variables are real-valued and can be either assigned to arithmetic 
expressions using the usual = operator, or sampled from probability distributions 
using the ~ operator. Probability distributions, which can be either discrete or 
continuous, take not only parameters that are constant, and thus known at 
compile time, but also parameters that depend on other variables, and thus 
determined only at run time. In other words, distributions may depend on the 
current state of the program, which is a random variable. Also, they may depend 
on other random variables; as such, distributions may be multi-variate, resulting 
from models with coupled and hierarchically-structured variables. 
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Semantics. The operational semantics of a probabilistic program induces a prob- 
ability space over runs, together with a stochastic process [13]. A state of the 
process is an element of IR” with n = [Vars], that is, a valuation of the variables 
in the program. The space of outcomes Qu, of a program is the set of runs. A 
run is a possibly infinite sequence of variable valuations (taken at the beginning 
of every loop iteration). This comes with a o-algebra F of measurable subsets of 
Nun. Initial states are chosen non-deterministically and, thereafter, the process 
is purely probabilistic. Every initial state xq € IR” determines a unique prob- 
ability measure P0): F — [0,1], namely a probability measure conditional on 
the state zo. The associated stochastic process is X‘*o) = {XE Len, where 
xe» is a random vector representing the state at the t-th step, initialised as 
xE = zo. Given an initial condition zo and a solution process X (7°), the asso- 
ciated termination time is a random variable T‘*) denoting the length of an 
execution, which takes values in IN U {oo}. 


Positive Almost-Sure Termination. Runs are probabilistic and thus also the 
notion of termination requires a quantitative semantics. The termination ques- 
tion is generalised to the notions of almost-sure and positive almost-sure termina- 
tion. Almost-sure termination (AST) indicates whether the joint probability of 
all runs that do not terminate is zero; positive almost-sure termination (PAST), 
which is stronger, indicates whether the expected number of steps to termination 
is finite. Formally, a probabilistic program terminates positively almost-surely 
if EJT] < oo for all ao € IR”. Notably, this implies that the program also 
terminates almost-surely, that is, PIT) < co] = 1 for all a9 € IR”. We provide 
conditions ensuring that probabilistic programs are PAST and, consequently, 
that they are AST. Notice that the converse may not be true, that is, there 
exist programs that are AST but not PAST. Our method addresses the PAST 
question only, by building upon the theory of ranking supermartingales [10]. 


Ranking Supermartingales. A scalar stochastic process {M;} is an RSM if, for 
some € > 0 and lower bound K € R, 


i [Mii | Mi = mz,...,Mo = mo] < my — € (1) 


and M; > K for all t > 0. In other words, this a process whose values are 
bounded from below and whose expected value decreases by a discrete amount 
at each step of the program. We prove that a program is PAST by mapping 
X(*) into an RSM. Our goal is finding a function 7: IR" — R such that, for 
every initial condition xo, it satisfies the following two properties: 


(i) on XE) | xe = x| < (x) — e for all x € J and 


(ii) n(x) > K for all z € J, 


where J C IR” is some sufficiently strong loop invariant that can be the loop 
guard or, possibly, a stronger condition. Function 7 maps the entire stochastic 
process into an RSM. For this reason, we call 7 an RSM for the program. 
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Input: Single-loop probabilistic program (G, U), 
Initial state zo € IR” 
Output: Transition samples S C R” x P(R”) 


1ı SÍ; 
2 P'e {xo}; 
3 for i — 1 to k do // k = path length 
4 PeP’; 
5 P’ — f; 
6 p «pick arbitrary element from P; 
7 if eval(G,p) = True then 
8 for j — 1 to m do // m = branching factor 
9 | P’ — P'U {exec(U,p)} 
10 S — SU {(p, P’)}; 
11 return S 


Algorithm 1: Interpreter 


Example 1. Consider the ambitious marble collector problem from Fig.1. An 
RSM for this program is a function ņn mapping variables red and blue to RR. 
Rephrasing condition (i) over this program, 7) is required to satisfy 


0.01 - n(red — 1, blue) + 0.99 - n(red, blue — 1) < n(red,blue)—«, (2) 


for all red, blue € Z that satisfy red > 0 V blue > 0, that is, the loop guard. 
So, for example, function 7(red,blue) = red + blue satisfies this condition; 
however, it may take any negative value over the arguments red and blue such 
that red > 0 V blue > 0, thus violating condition (ii). By contrast, the neural 
network in Fig.2 succeeds at satisfying both conditions. In fact, the network 
realises function 7(red,blue) = max{red,0} + max{blue,0}, which satisfies 
Eq. (2) and is bounded from below by zero. 


3 Training Neural Ranking Supermartingales 


Our framework synthesises RSMs by learning from program execution traces. We 
define a loss function, that measures the number of sampled program transitions 
that do not satisfy the RSM conditions. Applying gradient-descent optimisa- 
tion to the loss function guides the parameters to values at which the candi- 
date’s value decreases, on average, across sampled program transitions. Since 
the learner does not require the underlying program (only execution traces), 
the learner is agnostic to the structure of program expressions, and the cost of 
evaluating the loss function does not scale with the size of the program. 

A dataset of sampled transitions is produced using an instrumented program 
interpreter (Algorithm 1). At a program state p, the interpreter runs the loop 
body m times to sample successor states P’, where m is a branching factor hyper- 
parameter, before resuming execution from an arbitrarily chosen successor. The 
dataset S consists of the union of pairs (p, P’) generated by the interpreter. 
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Learnable parameters Sum of f 


Fig. 4. Neural ranking supermartingale architecture. 


The loss function is used to optimise the parameters of an NRSM, whose 
architecture is shown in Fig.4. This is a neural network with n inputs, one 
output neuron, and one hidden layer. The hidden layer has h neurons, each of 
which applies an activation function f to a weighted sum of its inputs. In our 
experiments, the activation function f is either f(x) = x? or f(x) = ReLU(2), 
where ReLU(x) = max{z, 0}. 

Therefore, we employ either of the two following functional templates, defined 
over the learnable parameters w; į and b;: 


— Sum of ReLU (SOR): 


h n 
n(21,---,0n) = X ReLU | X` wijz; +b; | ; (3) 
i=1 j=l 


— Sum of Squares (SOS): 


h 
(@1,---,;Ln) = S wijz; + bi ý (4) 


These choices of activation mean that our NRSMs are restricted to non-negative 
outputs, and therefore satisfy condition (ii) by construction. The learner there- 
fore needs to find parameters that satisfy condition (i), which requires ņ to 
decrease in expectation by at least some positive constant e€ > 0. 

The role of the loss function is to allow the learner parameters to be optimised 
such that the NRSM decreases, on average, across sampled transitions. That is, 
the loss function evaluates the number of sampled transitions for which the 
NRSM does not satisfy the RSM condition (i), and the lower its value, the more 
the neural network behaves like an RSM. 
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Concretely, the loss associated with a state p and its successors P’ is: 


L(p, P’) = softplus (Ep~pr[n(p.)] — n (p) + ©), (5) 


where softplus(~) = In(1 + e”), and E,,. p[n(p’)] is the average of 7 over the 
sampled successor states p’ from P’. 
We then train an NRSM by solving the following optimisation problem: 


eee 5 L(p, P’), (6) 


[S| (p,P)ES 


which aims to minimise the average loss over all sampled transitions in the 
dataset S, over the trainable weights w1,1,...,Wh,n E R and biases b),..., bp, € 
R. This objective is non-convex and non-linear, and we resort to gradient-based 
optimisation (see Sect. 6). 

The softplus in Eq. (5) forces the parameters to satisfy condition (i) uni- 
formly across all sampled transitions in the dataset, rather than decreasing by 
a large amount in expectation over some transitions at the expense of failing to 
decrease sufficiently quickly for others. Furthermore, for NRSMs of SOR form we 
replace the ReLU activation function by softplus, to help gradient descent con- 
verge faster. Softplus approximates the ReLU function, and has the same asymp- 
totic behaviour, but results in an NRSM that is differentiable w.r.t. the network 
parameters at all inputs, unlike ReLU [22, p.193]. However, since softplus is a 
transcendental function, we revert back to using a simpler ReLU activation when 
verifying an SOR candidate. 


Probabilistic program G,U 
l 


Transition 


samples S' NRSM 7 
Interpreter }——————} Learner }————— Verifier ~> PAST 


Ed 


Counterexample Z£ecex 


Fig. 5. CEGIS architecture for the adversarial training of NRSM. 


A CEGIS loop integrates the learner and verifier (Fig. 5). The dataset S 
sampled by the interpreter is used to train an NRSM candidate 7 according to 
Eq. (6). The verifier checks whether 1 satisfies condition (i), concluding either 
that the program is PAST, or producing a counterexample program state Xcex 
for which 7 does not satisfy (i). The interpreter generates new traces, starting 
at £cex, forcing it to explore parts of the state space over which the NRSM fails 
to decrease sufficiently in expectation. 
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Probabilistic A 
G 
program — Encode 7 
(G,U) fae fa] |, PAST 
Marginalise ————__ Verify 
im = Xcex 
NRSM 7 —> Round ul 


Fig. 6. Verifier architecture. 


4 Verifying Ranking Supermartingales by SMT Solving 


To verify an NRSM we must check that it decreases in expectation by at least 
some constant (condition (i)). Condition (ii) is satisfied by construction because 
the network’s output is non-negative for every input, leaving only condition (i) 
to verify. The architecture of the verifier is depicted in Fig.6. First, a program 


(G,U) is translated into an equivalent logical formulation denoted by G and 
U (‘Encode’ block), which are used to construct a closed-form term Efn] for 
the NRSM’s expected value at the end of the loop body (‘Marginalise’ block). 
Secondly, given an NRSM n, its parameters are rounded and encoded as a logical 
term 7 (‘Round’ block). Then, the satisfiability of the following formula is decided 


using SMT solving: 


G(a1...%n) AE[A|(a1...%n) > (T1... an) — €. (7) 


This is the dual satisfiability problem for the validity problem associated with 
condition (i) on page 5. If Eq. (7) is unsatisfiable, then 7 is a valid RSM and we 
conclude the program is PAST. Otherwise, the solver yields a counterexample 
state Xcex E IR”. 

The rounding strategy (‘Round’ block) provides multiple candidates to the 
verifier by adding i.i.d. noise to parameters and rounding them to various preci- 
sions. Setting parameters that are numerically very small to zero is useful since 
learning that a parameter should be exactly zero could require an unbounded 
number of samples; rounding provides a pragmatic way of making this work in 
practice. If none of the generated candidates are valid NRSMs, all counterexam- 
ples are passed back to the interpreter which generates more transition samples 
for the learner (Fig. 5). 


x € Vars (variables) 
NER (numerals) 
rusa|N|r+r|r—-T|... (terms) 
o:=T|7Ad|o¢A¢|oVe|T<T|TH=TI... (formulae) 


Fig. 7. Quantifier-free first-order logic formulae. 
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Notice that, if a program’s guard predicate is not strong enough to allow a 
valid RSM to be verified as such, the CEGIS loop will run indefinitely. In general, 
stronger supporting loop invariants may need to be provided. 


4.1 From Programs to Symbolic Store Trees 


We now introduce a translation from a loop-free probabilistic program to a 
symbolic store tree (Fig.8), a datastructure representing the distribution over 
program states at the end of a loop iteration as a function of the variable val- 
uation at its start. Marginalising out the probabilistic choices made in the loop 
yields the NRSM expectation E[n]. 


T ::= T | Bernoulli(r) | Gaussian(7,7) |... (probabilistic terms) 
X = {£1 > ™,...,2n + Tn} (symbolic store) 
o ::= node(ġ,0,0)| X (symbolic store tree) 


Fig. 8. Symbolic store tree. 


This requires a form of symbolic execution. We represent program states 
symbolically using symbolic stores, denoted X (Fig. 8), which map program vari- 
ables to probabilistic terms. A probabilistic term m can be either a first-order 
logic term (Fig. 7) representing an arithmetic expression, or a placeholder for a 
probability distribution whose parameters are terms (allowing them to be func- 
tions of the program state). Finally, symbolic store trees o (Fig. 8) represent the 
set of control-flow paths through the loop body, arising from if-statements; it is 
a binary tree with symbolic stores at the leaves, and internal nodes labelled by 
logical formulae over program variables. 


enc(X, x) = X(x) 
enc(X, —0) = —enc(X, O) enc(X,! O) = ~enc(X, O) 
enc(X, O1 op, O2) = enc( X, O1) | op, | enc( X, O1) 


( 
( 
( 
enc(X, skip) = X 
enc(X,x = E) = X|r' > enc(X, E)] 
enc(X, C1 ; C2) = enc(enc(X, C1), C2) 
enc(X,if B then Cı else C2 fi) = node(enc(X, B), enc(X, C1), enc(X, C2)) 
enc(node(¢, 01, 02), C) = node (ġ, enc(ai, C), enc(a2, C)) 
( 
5 


enc(X, x ~ Bernoulli(E)) = X[z' > v, v +> Bernoulli(enc(X, £))| 
enc(X, x ~ Gaussian( E1, E2)) = S[a2’ 4 v, v 4 Gaussian(enc( X, E1), enc(X, E2))] 


where every ~ command creates a fresh v variable. 


Fig. 9. Translation from a loop-free command to a symbolic store tree. 
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Figure 9 defines a translation from an initial symbolic store tree and command 
to a new symbolic store tree characterising the distribution over states after 
executing the command. At the top level, we provide the command G (the loop 
body) and the initial symbolic store {x} => £1,..., £i, @ £n}, where primed 
variables represent the variable valuation at the end of the iteration, whereas 
unprimed variables represent the variable valuation at the beginning of the loop. 

The first four cases of Fig. 9 define the translation of arithmetic expressions 
(to terms) and Boolean expressions (to formulae), by replacing program syntax 
with the corresponding logical operators. 

The next four cases define the translation of commands. skip leaves the 
symbolic store unchanged. For deterministic assignments, the right hand side 
of the assignment is translated in the current symbolic store and bound to the 
variable. Sequential composition involves translating the first command, and 
translating the second command in the resulting store tree. A conditional state- 
ment creates a new node in the symbolic store tree that selects between the two 
recursively-translated branches, based on the formula derived from the guard 
predicate. These rules assume the store tree to be a leaf-level symbolic store, 
because the next rule handles the case where the initial symbolic store tree 
is a node. Finally, if the command is a probabilistic assignment, we translate 
the parameters to terms, and bind the resulting probabilistic term to a freshly 
generated symbol. This allows variables to be overwritten by multiple proba- 
bilistic sampling operations in the body of the loop. The mapping of variables 
to distributions in leaf-level stores defines the probability density over particular 
probabilistic choices. 


Example 2. Figure 10 is the store tree produced for the ambitious marble collec- 
tor program (Fig. 1). Each leaf-level store in the program’s store tree corresponds 
to a particular control-flow path through the loop body. The interpretation of a 
symbolic store tree is that if we fix the outcomes of the probabilistic sampling 
operations performed by the loop body, then the state of the variables at the 
end of the iteration is determined by the predicates labelling the internal nodes. 


v#l v=1 
ie 
red’ +> red red’ +> red — 1 
blue’ > blue — 1 blue’ > blue 
poy poy 
v +} Bernoulli (0.01) v ++ Bernoulli (0.01) 


Fig. 10. A store tree for the program in Fig. 1. 
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4.2 Marginalisation 


To construct the closed-form logical term representing the NRSM’s expected 
value at the end of an iteration, the probabilistic choices in the symbolic store 
tree must be marginalised out. If the program is limited to discrete random 
variables with finite support, we automatically marginalise the random choices 
by enumeration (for both SOR- and SOS-form NRSMs), as illustrated by Ex. 3. 


Example 3. The ambitious marble collector program of Fig.1, yields the sym- 
bolic store tree of Fig. 10. Suppose we want to marginalise the NRSM: 


n(red, blue) = ReLU(wy1,1 - red + wi,2 : blue + b1) 
+ ReLU(we : red + we2-blue+b2) (8) 


with respect to this symbolic store tree. We first apply the encoding of the NRSM 
to each leaf-level symbolic store of Fig. 10, and enumerate the possible choices for 
the probabilistic choices (which in this example is limited to v € {0,1}), using 
the bindings of v to distributions in leaf-level stores to compute the probability 
mass of each choice. After resolving the predicates for each choice of v, this 
yields: 


0.01 - n(red — 1, blue) + 0.99 - n(red, blue — 1). (9) 


The term (9) is then provided as the value of the NRSM’s expectation to the 
verifier. 


If the program samples from continuous distributions, we marginalise SOS- 
form NRSMs (but not SOR-form NRSMs) by substituting symbolic moments 
for a set of supported built-in distributions, including Gaussian, Multivari- 
ateGaussian, and Exponential, though could include any distribution whose 
closed-form symbolic moments are available. Example 4 provides an example. 
This strategy is general enough to support a wide variety of programs, includ- 
ing those of Sect.5. If a sampling distribution lacks symbolic moments, the 
cumulative distribution function can also be utilised, which is illustrated in the 
slicedcauchy case study (Fig. 15). 


Example 4. Consider an NRSM n(x) = (wx + b)? and a symbolic store tree 
node(p = 1,01,02) where 9) = {x => x + v,v | Exp(A), p +> Bernoulli(3/4)} 
and o2 = {x m> x — v,v m Exp(A),p + Bernoulli(3/4)}. Exp(\) denotes 
the exponential distribution with parameter A, with pdf denoted pegxp yy (v). 
We apply 7 to each leaf-level symbolic store, and marginalise the probabilis- 
tic choices. We marginalise p first by enumerating over its possible values, and 
then marginalise v. There are no dependencies between the distributions in this 
example, so the order in which they are marginalised does not matter. 


[ (Ftc +v)+ pale = ») PExp(a) (v)dv. (10) 
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The result of marginalisation is a closed-form expression for Eq. (10). Note that 
since 


n(x +v) = wv? + 2(wa + bjw + (wa + b)? (11) 
and Pa U" Dexp(a)(v)du = a, we use linearity of integration to perform the 


following simplification, by substituting expressions for the moments of v in 
terms of the parameter A: 


an Qw? 2 b 
1 n(x + v)pexpa (v)dv = 5 F wrt ie + (wa +b)’, (12) 
0 


which is used to reduce Eq. (10) to a closed form. This is the method used to 
perform marginalisation for several case studies, including crwalk, gaussrw and 
expdistrw. 


Notably, our verifier requires the expected value of the RSM to be com- 
puted (or soundly approximated) in closed form. We automate marginalisation 
for discrete distributions of finite support, but require manual intervention for 
continuous distributions. Nevertheless, our learning component is automated in 
both cases. Characterising the space of programs with continuous distributions 
that admit fully automated verification of an RSM is an open question. 


5 Case Studies 


Existing tools for synthesising RSMs reduce the problem to constraint-solving 
[2,10,11,14], which can limit the generality of the synthesis framework. For 
instance, methods that convert the RSM constraints into a linear program using 
Farkas’ lemma can only handle programs with affine arithmetic, and can only 
synthesise linear /affine (lexicographic) RSMs [2, 10]. A second restriction of exist- 
ing approaches is that they typically require the moments of distributions to be 
compile-time constants. This rules out programs whose distributions are deter- 
mined at runtime, such as hierarchical and state-dependent distributions. Since 
the loss function of Eq. (6) only requires execution traces, our learner is agnostic 
to the structure of program expressions, imposing minimal restrictions on the 
kinds of expressions that can occur, or the kinds of distributions that can be 
sampled from. This allows us to learn RSMs for a wider class of programs com- 
pared to existing tools, as we will illustrate in this section using a number of 
case studies. 


5.1 Non-linear Program Expressions and NRSMs 


Many simple programs do not admit linear or polynomial RSMs, such as Fig. 1. 
Since the program cannot be encoded as a prob-solvable loop (due to the dis- 
junctive guard predicate which cannot be replaced by a polynomial inequality), 
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it cannot be handled by another recent tool, AMBER [39]. However, this program 
admits the following piecewise-linear NRSM: 


ReLU(0: red + 1- blue +11) + ReLU(1: red+0- blue + 11), (13) 


whose parameters are learnt by our method, within the first CEGIS iteration. 


1 while (i <= 10 && s > 0) do 

2 r ~ DiscreteUniform({-2, 2}); 
3 s =r +s * i; 

4 p ~ Bernoulli (3/4); 

5 if (p == 1) then 

6 isit i 

7 else 

8 i = i - 1 

9 fi 

10 od 


Fig. 11. Probabilistic factorial (probfact). 


Similarly, we learn the piecewise-linear NRSM: 


ReLU(-1-i+0-s+12)+ReLU(0-i+0-s+9) (14) 


for the program in Fig. 11, which contains a bilinear assignment (cf. multiplica- 
tion of s and i on line 3), so this program is not supported by [2]. The conjunction 
in the guard means it is not supported by AMBER, either. 


1 while (x < 10) do 

2 rho ~ ContinuousUniform(-0.5, 1); 

3 covM = [[1, rho], [rho, 1]]; 

4 wi, w2 ~ MultivariateGaussian([0, 0], covM); 
5 x = x + power((wl + w2), 2) - 2 

6 od 


Fig. 12. Random walk with correlated variables (crwalk). 


5.2 Multivariate and Hierarchical Distributions 


Figure 12 is a random walk that samples from a multivariate Gaussian distribu- 
tion, with zero mean, unit variances, and correlation sampled uniformly in the 
range [-3; 1]. The MultivariateGaussian of line 4 is an instance of a hierar- 
chical distribution, having parameters that are random variables. This program 
also contains a non-linear (polynomial) expression that updates the value of x. 
For crwalk we learn an SOS-form NRSM: 


(0.1.x — 47.2)”, (15) 
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proving this program is PAST. To verify this, the NRSM expectation is com- 
puted via the symbolic moments of the multivariate Gaussian distribution, given 
its covariance matrix (line 3), and then marginalising w.r.t. rho (again, using 
the moments of the uniform distribution over [-3 1] ). Unfortunately, it is chal- 
lenging to translate many simple programs containing hierarchical distributions 
into ones that can be handled by existing tools. For instance, although it is 
possible to simulate sampling from a bivariate Gaussian of arbitrary correla- 
tion by sampling from independent standard Gaussian distributions, this would 
involve computing a non-polynomial function of the correlation. Similarly, for 
the program in Fig. 14 (further discussed below), if a variable is exponentially 
distributed, X ~ Exponential(1), then * ~ Exponential(A), providing a way 
of simulating an exponential distribution with arbitrary parameter A. However, 
this again requires a non-polynomial program expression (i.e. the reciprocal of 
A) when A is part of the program state and not a constant, and therefore out of 
scope for methods that restrict program expressions to being linear /polynomial. 


5.3 State-Dependent Distributions and Non-Linear Expectations 


1 while (x < 0 && y < 0) do 

2 si ~ Gaussian(0, 1/4); 

3 vx = min(2, max(0.1, vx + s1)); 

4 s2 ~ Gaussian(0, 1/4); 

5 vy = min(2, max(0.1, vx + s2)); 

6 s3 ~ Gaussian(0, 1/4); 

7 rho = min(1, max(-1, rho + s3)); 

8 mean = [sqrt(1+power(x, 2)),sqrt(1t+tpower(y, 2))]; 
9 cov = rho * sqrt(vx * vy); 

10 covM = [[vx cov], [cov vy]]; 

11 wi, w2 ~ MultivariateGaussian(mean, covM); 
12 x = x + wil; 

13 y = y + w2 

14 od 


Fig. 13. Gaussian random walk with time-varying and coupled noise (gaussrw). 


Once we allow hierarchical distributions, it is natural to consider state-dependent 
distributions, i.e. distributions whose parameters depend on the program state 
rather than being sampled from other distributions. As an example, consider the 
program in Fig. 13 (a 2-dimensional Gaussian random walk with state-dependent 
moments). This is unsupported by existing tools because the mean of the Gaus- 
sian is a non-polynomial function of the program state. However, after defining 
the function v1 + x? by means of the following polynomial logical inequalities: 


mux? = 1 + x° (16) 
mux > 1 (17) 
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(similarly for mu_y), we express the expected value of an SOS-form NRSM in 
terms of symbolic moments mu_x, etc. Since these moments are state-dependent, 
we cannot marginalise them out as in the hierarchical case. Instead we perform 
non-deterministic abstraction, providing inequalities b < vx,vy < 2 and —1 < 
rho < 1 as further verifier assumptions. 


1 while (x < 10) do 

2 s ~ Gaussian(0, 1); 

3 lambda = min(10, max(1, lambda + s); 
4 step ~ Exponential (lambda) ; 

5 p ~ Bernoulli (3/4); 

6 if (p == 1) then 

7 x = x + step 

8 else 

9 x = x - step 

10 fi 


11 od 


Fig. 14. State-dependent exponential random walk (expdistrw). 


Even if program expressions are linear, the presence of state-dependent dis- 
tributions can result in a non-linear verification problem, if the moments are 
themselves non-linear functions of the program variables. For instance, the pro- 
gram in Fig. 14 represents a 1-dimensional random walk, with steps sampled 
from an exponential distribution. Since the nt! moment of Exponential ()) is 
aL, the expectation of an SOS-form NRSM is non-polynomial but still express- 
ible in the theory of non-linear real arithmetic (see Ex. 4). For expdistrw we 
learn 


(0.1 -x — 3.3)”, (18) 
whereas for gaussrw in Fig. 13 we learn 


(0-x—1-y+11)?+(0-x+0-y+8)?. (19) 


We translate the program in Fig. 14 for AMBER by replacing the update for A 
by instead sampling it uniformly from [1,10]. AMBER correctly identifies the 
program is AST, and that (10 — x) is a supermartingale expression (note, not an 
RSM), though does not report that the program is PAST (answering “maybe” ). 


5.4 Undefined Moments 


The ability to evaluate the cumulative distribution function (CDF) of a sampled 
distribution could be useful in marginalisation, even if the moments of the sam- 
pled distribution are undefined or not known analytically to infinite precision. 
An example is Fig. 15: the program samples from the standard Cauchy distri- 
bution, for which all moments are undefined. Since the sampled value is only 
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used to determine which branch of a conditional is taken, the RSM expectation 
is well defined, and can be expressed in terms of the standard Cauchy CDF. 
Namely, the if-branch is taken with probability q = 1 — (4arctan(10) + 4). This 
equation is not expressible using polynomials; so we perform a sound approx- 
imation by introducing a new variable that is quantified over a small interval 
surrounding a finite precision approximation to q. This allows us to learn and 
verify the SOR-form NRSM: 


ReLU(1.2-x + 9.1). (20) 


while (x > 0) do 
p ~ StandardCauchy (); 
if (p > 10) then 

x = x+2 


1 

2 

3 

4 

5 else 

6 x =x - 1 
7 fi 
8 od 


Fig. 15. Sliced Cauchy distribution (slicedcauchy). 


For our experimental evaluation (Sect. 6) we create a modified version of each 
of the six case studies described in this section, as follows: 


— program marbles3 is a generalisation of marbles to three marble types, 
instead of two; 

— probfact2 uses 5/8 as the Bernoulli parameter, rather than 3/4; 

— crwalk2 samples rho from a Beta(1,3) distribution, instead of a uniform 
distribution over [-3; 1] ; 

— expdistrw2 samples from an exponential distribution, where parameter 
lambda is replaced by lambda*lambda; 

— gaussrw2 uses [3 + 1/(1 —x),3+1/(1—y)]” for its mean vector, instead of 
[V1 +x, /1+y?]"; and 

— slicedcauchy2 has a loop guard of x < 10, instead of x > 0, and swaps the 
two branches of the conditional. 


5.5 Rare Transitions 


A limitation of relying on a sampled transition dataset to learn NRSM parame- 
ters is we rely on the average Ep~ p/[n(p’)] in Eq. (5) being accurate (see Sect. 3). 
This assumption is challenged by programs that have certain control-flow paths 
of very low probability, which are unlikely to be sampled by the interpreter. For 
example, in the context of the ambitious marble collector (Fig. 1), Fig. 16 shows 
that when the probability of obtaining a red marble decreases below 277, our 
success rate drops. This is because a lower probability makes the corresponding 
control-flow path rarer in the dataset, to the point where the expected value of 
the NRSM cannot be estimated accurately. 
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Fig. 16. Success rate and execution times for the ambitious marble collector program 
(Fig. 1), where p is the probability of taking the if-branch. Success rate refers to the 
fraction of 10 executions that succeeded in finding an NRSM before a timeout of 300s. 
Execution times show the median time with the error bar ranging between the minimum 
and maximum times of the 10 executions. 


6 Experimental Results 


We built a prototype implementation of our framework (in Python) and present 
experimental results for benchmarks adapted from previous work, as well as our 
own case studies (from Sect.5). The case studies illustrate programs for which 
our framework synthesises an RSM, yet existing tools cannot prove to be PAST. 

The learner is implemented with JAx [9]. To train NRSMs, we use AdaGrad 
[18] for gradient-based optimisation, with a learning rate of 10~?. Parameters 
are initialised by sampling from Gaussian distributions: weight parameters are 
sampled from a zero-mean Gaussian, whereas the bias parameters are sampled 
either from a Gaussian with mean 10 (for SOR candidates) or mean 0 (for SOS 
candidates). We verify the NRSMs using the SMT solver Z3 [26,40]. The out- 
comes are obtained on the following platform: macOS Catalina version 10.15.4, 
8GB RAM, Intel Core i5 CPU 2.4GHz QuadCore, 64-bit. 

As mentioned in Sect. 4, the verifier checks a candidate NRSM over states 
satisfying the loop predicate, which characterises the set of reachable states. For 
our experiments, we manually provide the NRSM expectation, and augment the 
guard predicate with additional invariants where necessary. We generate out- 
comes using two different rounding strategies (Sect. 4): an “aggressive” rounding 
strategy which generated between 80 and 120 candidates per CEGIS iteration, 
and a “weaker” rounding strategy producing between 15 to 25 candidates per 
CEGIS iteration. The outcomes in Table 1 used the aggressive rounding strategy. 
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Table 1. Experimental results over existing (top section) and newly added bench- 
marks (bottom section); (c) indicates the benchmark uses continuous distributions, (d) 
indicates it only uses discrete distributions. All reported times are in seconds, oot indi- 
cates time-out after 300s, n/a indicates the tool terminated without definite answer, 
and— indicates the benchmark is unsupported. Our method is run 10 times with dif- 
ferent seeds; the overall success rate is reported. Runtimes of interpretation, training, 
verification phases, and # of CEGIS iterations refer to the run with median total 
runtime. 


Program AMBER | Farkas’ | ABSYNTH| Succ. | Inter.| Train. | Verif. | #iter | NRSM 
[39] lemma [2] [41] rate 
Hare & Tortoise (d) 0.04 z0 0.09 10/10| 0.61 | 3.86 |0.70 |O SOR 
exmini/terminate (d) |— 0.02 oot 10/10| 1.75 |29.35 | 7.67 |2 SOR 
aaron2 (d) 0.03 0.02 0.02 10/10| 0.04| 2.27 |0.01 |0 SOR 
catmouse (c) 0.03 0.02 = 9/10 | 0.39 | 12.41 |3.68 |1 SOS 
counterexic (d) — 0.02 0.22 8/10 | 1.00} 6.71 |0.02 |O SOR 
easy1 (d) 0.12 |0.01 0.05 10/10) 1.12] 5.55 |1.27 |o SOR 
easy2 (c) 0.04 [0.02 = 10/10) 1.55] 6.79 |o.18 |o SOS 
ndecr (d) 0.04 0.02 0.03 10/10| 1.18 | 5.63 |0.02 |0 SOR 
randomid (c) 0.05 0.02 = 10/10} 1.14] 4.86 |0.79 |O SOS 
rsd (d) error |0.01 oot 10/10| 1.14] 6.18 |2.04 |0 SOR 
speedFails1 (d) 0.07 0.01 0.04 10/10| 0.45} 4.09 |0.67 |0 SOR 
speedP1di2 (d) = 0.02 0.40 9/10 | 1.36] 7.85 [0.02 J0 SOR 
speedP1di3 (d) = 0.02 0.36 8/10 | 2.58/30.70 |2.12 |1 SOR 
speedP1di4 (d) = 0.02 0.17 10/10) 0.68] 5.07 |0.04 lo SOR 
speedSingleSingle (c) |0.03 0.02 G 10/10| 0.39| 2.85 |0.51 |0 SOS 
speedSingleSingle2 (d) | — 0.02 0.15 10/10| 0.83} 7.30 |0.04 |0 SOR 
wceto (d) = 0.02 0.10 10/10) 1.45] 5.64 |0.09 lo SOR 
wcet1 (d) = 0.02 0.10 10/10| 0.85| 4.31 |0.09 |0 SOR 
probfact (d) = = n/a 10/10| 0.49} 6.12 |0.16 |0 SOR 
probfact2 (d) — = n/a 10/10| 0.45} 5.89 |0.23 |0 SOR 
marbles (d) = — n/a 10/10| 0.84 |10.83 |0.91 |0 SOR 
marbles3 (d) — = n/a 10/10| 0.40 |70.14 |7.87 |2 SOR 
crwalk (c) 10/10| 0.53 | 3.06 |1.56 |1 SOS 
crwalk2 (c) 10/10| 1.32| 3.11 |0.75 |1 SOS 
expdistrw (c) n/a = — 10/10} 0.05} 1.53 |0.01 |0 SOS 
expdistrw2 (c) n/a = = 10/10| 4.92| 3.15 |1.03 |1 SOS 
gaussrw (c) 10/10 |10.30 | 3.45 |0.75 |0 SOS 
gaussrw2 (c) 9/10 15.46 | 4.91 |5.33 |O SOS 
slicedcauchy (c) 10/10| 0.02} 3.31 |0.01 |0 SOR 
slicedcauchy2 (c) 10/10| 0.01 | 2.16 |0.03 |0 SOR 


Benchmarks from Previous Work. We run our prototype on single-loop programs 
from the WTC benchmark suite [3], augmented with probabilistic branching and 
assignments [2]. These correspond to the programs in the first section of Table 1. 
We perturb assignment statements by adding noise sampled from a discrete 
uniform distribution of support {—2, 2}, or a continuous uniform distribution on 
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the interval [—2, 2]. The while loops are also made probabilistic; with probability 
1/2 the loop is executed, and with the remaining probability a skip command 
is executed. 

We compare our framework against three existing tools. The first is AMBER 
[39]: where possible, we translate instances from the WTC suite into the lan- 
guage of AMBER, but this is not possible for some programs where the loop pred- 
icate is a logical conjunction or disjunction of predicates (indicated by dashes in 
Table 1). Second, we compare against a tool for synthesising affine lexicographic 
RSMs (referred to as Farkas’ lemma) for affine programs (i.e. containing only lin- 
ear expressions), based on reduction to linear programming via Farkas’ lemma 
[2]. This is applicable to probabilistic programs with nested-loops, unlike our 
method. However, since it is limited to affine programs and affine lexicographic 
RSMs, it is not able to analyse all the programs we consider (again, indicated 
by dashes in Table 1). The third tool is ABSYNTH [41], for which we are able to 
encode all programs that were limited to discrete random variables. 

The experimental results (Table 1) show that for all the WTC benchmarks 
our approach has a success rate of at least 8/10, and is able to synthesise an RSM 
within 2 iterations (for the seed that results in median total execution time). For 
15 of the 18 WTC benchmarks no full CEGIS iterations are required. As expected 
our approach, particularly the learning component, is much slower than all three 
tools. However, our framework has broader applicability, as illustrated with the 
next set of experiments. 


Newly Defined Case Studies. The examples in the second section of Table 1 
(from Sect. 5) are not proven PAST by any of the three tools. Our approach 
is able to do so with a success rate of at least 9/10, under the “aggressive” 
rounding strategy. Of the new examples, marbles3 (Sect.5) requires the longest 
time, since we use an NRSM with h = 3 ReLU nodes (see Sect.3), and six of 
the nine parameters must be brought sufficiently close to zero to learn a valid 
RSM. For gaussrw/gaussrw2, we find it necessary to set an SMT solver time 
limit within the CEGIS loop (of 200ms for gaussrw, and 5s for gaussrw2), 
such that candidates taking longer than this to verify are skipped. The fact that 
these examples are harder to verify is unsurprising, given that they give rise 
to non-polynomial decision problems, containing equationally defined rational 
expressions. In comparing the two rounding strategies, we find that using the 
“ageressive” strategy tends to result in fewer CEGIS iterations, reducing the 
learner time, while increasing the verifier time: this is to be expected, since a 
larger number of candidates needs to be checked in each CEGIS iteration. 


7 Conclusion 


We have presented the first machine learning method for the termination anal- 
ysis of probabilistic programs. We have introduced a loss function for training 
neural networks so that they behave as RSMs over sampled execution traces; our 
training phase is agnostic to the program and thus easily portable to different 
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programming languages. Reasoning about the program code is entirely delegated 
to our checking phase which, by SMT solving over a symbolic encoding of pro- 
gram and neural network, verifies whether the neural network is a sound RSM. 
Upon a positive answer, we have formally certified that the program is PAST; 
upon a negative answer, we obtain a counterexample that we use to resample 
traces and repeat training in a CEGIS loop. Our procedure runs indefinitely for 
programs that are not PAST, as these necessarily lack a ranking supermartin- 
gale, and may run indefinitely for some PAST programs. Nevertheless, we have 
experimentally demonstrated over several PAST benchmarks that our method 
is effective in practice and covers a broad range of programs w.r.t. existing tools. 

Our method naturally generalises to deeper networks, but whether these 
are necessary in practice remains an open question; notably, neural networks 
with one hidden layer were sufficient to solve our examples. We have exclu- 
sively tackled the PAST question, and techniques for almost-sure (but not nec- 
essarily PAST) termination and non-termination exist [16,37,39]. Our results 
pose the basis for future research in machine learning (and CEGIS) for the for- 
mal verification of probabilistic programs. Different verification questions will 
require different learning models. Our approach lends itself to extensions toward 
probabilistic safety, exploiting supermartingale inequalities, and towards the 
non-termination question, using repulsing supermartingales [16]. Adapting our 
method to termination analysis with infinite expected time is also a matter for 
future investigation [37]. Moreover, we have exclusively considered purely proba- 
bilistic single-loop programs: generalisations to programs with non-determinism, 
arbitrary control-flow, and concurrency are material for future work [15,20,35]. 
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Abstract. Programs for multiprocessor machines commonly perform 
busy waiting for synchronization. We propose the first separation logic 
for modularly verifying termination of such programs under fair schedul- 
ing. Our logic requires the proof author to associate a ghost signal with 
each busy-waiting loop and allows such loops to iterate while their cor- 
responding signal s is not set. The proof author further has to define 
a well-founded order on signals and to prove that if the looping thread 
holds an obligation to set a signal s’, then s’ is ordered above s. By using 
conventional shared state invariants to associate the state of ghost signals 
with the state of data structures, programs busy-waiting for arbitrary 
conditions over arbitrary data structures can be verified. 


1 Introduction 


Programs for multiprocessor machines commonly perform busy waiting for syn- 
chronization [22,23]. In this paper, we propose a separation logic [24,31] to mod- 
ularly verify termination of such programs under fair scheduling. Specifically, we 
consider programs where some threads busy-wait for a certain condition C over 
a shared data structure to hold, e.g., a memory flag being set by other threads. 
By modularly, we mean that we reason about each thread and each function in 
isolation. That is, we do not reason about thread scheduling or interleavings. We 
only consider these issues when proving the soundness of our logic. Assuming 
fair scheduling is necessary since busy-waiting for a condition C only termi- 
nates if the thread responsible for establishing the condition is sufficiently often 
scheduled to establish C. 

Busy waiting is an example of blocking behaviour, where a thread’s progress 
requires interference from other threads. This is not to be confused with non- 
blocking concurrency, where a thread’s progress does not rely on—and may 
in fact be impeded by—interference from other threads. Existing proposed 
approaches for verifying termination of concurrent programs consider only pro- 
grams that only involve non-blocking concurrent objects [32], or primitive block- 
ing constructs of the programming language, such as acquiring built-in mutexes, 
receiving from built-in channels, joining threads, or waiting for built-in monitor 
condition variables [2,5,19], or both [11]. Existing techniques that do support 
busy waiting are not Hoare logics; instead, they verify termination-preserving 
© The Author(s) 2021 
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contextual refinements between more concrete and more abstract implementa- 
tions of busy-waiting concurrent objects [15,21]. In contrast, we here propose 
the first conventional program logic for modular verification of termination of 
programs involving busy waiting, using Hoare triples as module specifications. 

In order to prove that a busy-waiting loop terminates, we have to prove that 
it performs only finitely many iterations. To do this we introduce a special form 
of ghost resources [13] which we call ghost signals. As ghost resources they only 
exist on the verification level and hence do not affect the program’s runtime 
behaviour. Signals are initially unset and come with an obligation to set them. 
Setting a signal does not by definition correspond to any runtime condition. So, 
in order to use a signal s effectively, anyone using our approach has to prove an 
invariant stating that s is set if and only if the condition of interest holds. Further, 
the proof author must prove that every thread discharges all its obligations by 
performing the corresponding actions, e.g., by setting a signal and establishing 
the corresponding condition by setting the memory flag. 

In our verification approach we tie every busy-waiting loop to a finite set of 
ghost signals S that correspond to the set of conditions the loop is waiting for. 
Every iteration that does not terminate the loop must be justified by the proof 
author proving that some signal s € S has indeed not been set, yet. This way, 
we reduce proving termination to proving that no signal is waited for infinitely 
often. 

Our approach ensures that no thread directly or indirectly waits for itself by 
requiring the proof author (i) to choose a well-founded and partially ordered set 
of levels Levs and (ii) to assign a level to every signal and by (iii) only allowing 
a thread to wait for a signal if the signal’s level is lower than the level of each 
held obligation. This guarantees that every signal is waited for only finitely often 
and hence that every busy-waiting loop terminates. We use this to prove that 
every program that is verified using our approach indeed terminates. 

We start by gradually introducing the intuition behind our verification app- 
roach and the concepts we use. In Sect. 2.1 and Sect. 2.2 we present the main 
aspects of using signals to verify termination. We start by treating them as phys- 
ical thread-safe resources and only consider busy waiting for a signal to be set. 
Then, we drop thread-safety and explain how to prove data-race- and deadlock- 
freedom. In Sect. 2.3 and Sect. 2.4 we generalize our approach to busy waiting 
for arbitrary conditions over arbitrary data structures and then lift signals to 
the verification level by introducing ghost signals. 

In Sect. 3 we sketch the verification of a realistic producer-consumer example 
involving a bounded FIFO to demonstrate our approach’s usability and address 
fine-grained concurrency in Sect. 4. Further, we describe the available tool sup- 
port in Sect. 5 and discuss integrating higher-order features in Sect. 6. We con- 
clude by comparing our approach to related work and reflecting on it in Sect. 7 
and Sect. 8. 

We formally define our logic and prove its soundness in the extended version 
of this paper [28]. To keep the presentation in this paper simple, we assume busy- 
waiting loops to have a certain syntactical form. In our technical report [29] we 
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present a generalised version of our logic and its soundness proof. Further, we 
verify the realistic example presented in Sect. 3 in full detail in the extended 
version of this paper and in the technical report, using the respective version of 
our logic. We used our tool support to verify C versions of the bounded FIFO 
example and the CLH lock. The tool we used and the annotated .c files can be 
found at [10, 26,27]. 


2 A Guide on Verifying Termination of Busy Waiting 


When we try to verify termination of busy-waiting programs, multiple challenges 
arise. Throughout this section, we describe these challenges and our approach to 
overcome them. In Sect. 2.1 we start by discussing the core ideas of our logic. In 
order to simplify the presentation we initially consider a simple language with 
built-in thread-safe signals and a corresponding minimal example where one 
thread busy-waits for such a signal. Signals are heap cells containing boolean 
values that are specially marked as being solely used for busy waiting. Through- 
out this section, we generalize our setting as well as our example towards one 
that allows to verify programs with busy waiting for arbitrary conditions over 
arbitrary shared data structures. In Sect. 2.2 we present the concepts neces- 
sary to verify data-race-, deadlock-freedom and termination in the presence of 
built-in signals that are not thread safe. In Sect. 2.3 we explain how to use these 
non-thread-safe signals to verify programs that wait for arbitrary conditions over 
shared data structures. We illustrate this by an example waiting for a shared 
heap cell to be set. In Sect. 2.4 we erase the signals from our program and lift 
them to the verification level in the form of a concept we call ghost signals. 


2.1 Simplest Setting: Thread-Safe Physical Signals 


We want to verify programs that busy-wait for arbitrary conditions over arbi- 
trary shared data structures. As a first step towards achieving this, we first 
consider programs that busy-wait for simple boolean flags, specially marked as 
being used for the purpose of busy waiting. We call these flags signals. For now, 
we assume that read and write operations on signals are thread-safe. Consider 
a simple programming language with built-in signals and with the following 
commands: (i) new-signal for creating a new unset signal, (ii) set_signal() 
for setting x and (iii) await is_set(x) for busy-waiting until x is set. Figure 1 
presents a minimal example where two threads communicate via a shared sig- 
nal sig. The main thread creates the signal sig and forks a new thread that 
busy-waits for sig to be set. Then, the main thread sets the signal. As we assume 
signal operations to be thread-safe in this example, we do not have to care about 
potential data races. Notice that like all busy-waiting programs, this program is 
guaranteed to terminate only under fair thread scheduling: Indeed, it does not 
terminate if the main thread is never scheduled after it forks the new thread. In 
this paper we verify termination under fair scheduling. 
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let sig := new _signal in 
fork await is_set(sig); 
set_signal(sig) 


Fig. 1. Minimal example with two threads communicating via a physical thread-safe 
signal. 


Augmented Semantics 


Obligations. The only construct in our language that can lead to non-termination 
are busy-waiting loops of the form await is_set(sig). In order to prove that 
programs terminate it is therefore sufficient to prove that all created signals are 
eventually set. We use so-called obligations [5,6,16,19] to ensure this. These are 
ghost resources [13], i.e., resources that do not exist during runtime and can hence 
not influence a program’s runtime behaviour. They carry, however, information 
relevant to the program’s verification. Generally, holding an obligation requires 
a thread to discharge it by performing a certain action. For instance, when 
the main thread in our example creates signal sig, it simultaneously creates an 
obligation to set it. The only way to discharge this obligation is to set sig. 

We denote thread IDs by @ and describe which obligations a thread 0 holds 
by bundling them into an obligations chunk 0.obs(O), where O is a multiset of 
signals. We denote multisets by double braces {...} and multiset union by W. 
Each occurrence of a signal s in O corresponds to an obligation by thread 0 to 
set s. Consequently, 6.obs(@) asserts that thread 6 does not hold any obligations. 


Augmented Semantics. In the real semantics of the programming language we 
consider here, ghost resources such as obligations do not exist during runtime. 
To prove termination, we consider an augmented version of it that keeps track 
of ghost resources during runtime. In this semantics, we maintain the invariant 
that every thread holds exactly one obs chunk. That is, for every running thread 
0, our heap contains a unique heap cell @.obs that stores the thread’s bag of 
obligations. Further, we let a thread get stuck if it tries to finish while it still holds 
undischarged obligations. Note that we use the term finish to refer to thread- 
local behaviour while we write termination to refer to program-global behaviour, 
i.e., meaning that every thread finishes. For every augmented execution there 
trivially exists a corresponding execution in the real semantics. 

Figure 2 presents some of the reduction rules we use to define the augmented 
semantics. We use h to refer to augmented heaps, i.e., heaps that can contain 


ghost resources. A reduction step has the form h, c aug hi ,c',T expresses that 


thread @ reduces heap h (which is shared by all threads) and command c to heap 
h! and command e. Further, T represents the set of threads forked during this 
step. It is either empty or a singleton containing the new thread’s ID and the 
command it is going to execute, i.e., {(0f,cf)}. We omit it whenever it is clear 
from the context that no thread is forked. Further, we denote disjoint union of 
sets by U. 
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Our reduction rules comply with the intuition behind obligations we outlined 
above. AUG-RED-NEWSIGNAL creates a new signal and simultaneously a corre- 
sponding obligation. The only way to discharge it is by setting the signal using 
AUG-RED-SETSIGNAL. 


AUG-RED-NEWSIGNAL Po 
id Z ids(h) L € Levs 


hu {6.obs(O)}, new-_signal ae h U {0.0bs(O w {(id, L)}), signal( (id, L))}, id 


AUG-RED-SETSIGNAL 
h U {0.obs(O w {s})}, set_signal(s.id) siie h U {0.obs(O), signalSet(s)}, tt 


AUG-RED-FORK 7 
O; Z thids(h) 


hU {0.obs(O & Op)}, fork c Saug h LI {0.0bs(O), 0r .obs(Op)}, tt, { (8r, c)} 


AuG-RED-AWAIT = si 
0.obs(O) € h signal(s) € h signalSet(s) g h s.lev <_ O 


h, await is_set(s.id) Sse h, await is_set(s.id) 


Fig. 2. Reduction rules for augmented semantics. 


Forking. Whenever a thread forks a new thread, it can pass some of its obliga- 
tions to the newly forked thread, cf. AUG-RED-ForK. Forking a new thread with 
ID 6; also allocates a new heap cell 67.obs to store its bag of obligations. Since 
this is the only way to allocate a new obs heap cell, we will never run into a heap 
h U {0.obs(O)} U {8.0bs’(O’)} that contains multiple obligations chunks belong- 
ing to the same thread 0. Remember that threads cannot finish while holding 
obligations. This prevents them from dropping obligations via dummy forks. 


Levels. In order to prove that a busy-waiting loop await is_set(sig) terminates, 
we must ensure that the waiting thread does not directly or indirectly wait for 
itself. We could just check that it does not hold an obligation for the signal it 
is waiting for, but that is not sufficient as the following example demonstrates: 
Consider a program with two signals sig,, sig. and two threads. Let one thread 
hold the obligation for sig, and execute await is_set(sig,); set_-signal(sig,). 
Likewise, let the other thread hold the obligation for sig, and let it execute 
await is_set(sig,); set_signal(sig, ). 

To prevent such wait cycles modularly, we apply the usual approach [3, 4, 19]. 
For every program that we want to execute in our augmented semantics, we 
choose a partially ordered set of levels Cevs. Further, during every reduction 
step in the augmented semantics that creates a signal s, we pick a level L € 
Levs and associate it with s. Note that much like obligations, levels do not exit 
during runtime in the real semantics. Signal chunks in the augmented semantics 
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have the form signal((id, L)) where id is the unique signal identifier returned 
by new-signal. The level assigned to any signal can be chosen freely, cf. AUG- 
RED-NEWSIGNAL. In practice, determining levels boils down to solving a set of 
constraints that reflect the dependencies. In our example, however, the choice 
is trivial as it only involves a single signal. We choose Levs = {0} and 0 as 
level for sig and thereby get signal((sig,0)). Generally, we denote signal tuples 
by s = (id, L). Now we can rule out cyclic wait dependencies by only allowing a 
thread to busy-wait for a signal s if its level s.lev is smaller than the level of each 
held obligation, cf. AUG-RED-AwalIt!. Given a bag of obligations O, we denote 
this by s.lev <, O. 


Proving Termination. As we will explain below, the augmented semantics has 
no fair infinite executions. We can use this as follows to prove that a program 
c terminates under fair scheduling: For every fair infinite execution of c, show 
that we can construct a corresponding augmented execution. (This requires that 
each step’s side conditions in the augmented semantics are satisfied. Note that 
we thereby prove certain properties for the real execution, like absence of cyclic 
wait dependencies.) As there are no fair infinite executions in the augmented 
semantics, we get a contradiction. It follows that c has no fair infinite executions 
in the real semantics. 


Soundness. In order to prove soundness of our approach, we must prove that 
there indeed are no fair infinite executions in the augmented semantics. This 
boils down to proving that no signal can be waited for infinitely often. Consider 
any program and any fair augmented execution of it. Consider the execution’s 
program order graph, (i) whose nodes are the execution steps and (ii) which has 
an edge from a step to the next step of the same thread and to the first step 
of the forked thread, if it is a fork step. Notice that for each obligation created 
during the execution, the set of nodes corresponding to a step made by a thread 
while that thread holds the obligation constitutes a path that ends when the 
obligation is discharged. We say that this path carries the obligation. 

It is not possible that a signal is waited for infinitely often. Indeed, suppose 
some signals S° are. Take Smin E S% with minimal level. Since Smin is never 
set, the path in the program order graph that carries the obligation must be 
infinite as well. Indeed, suppose it is finite. The final node N of the path cannot 
discharge the obligation without setting the signal, so it must pass the obligation 
on either to the next step of the same thread or to a newly forked thread. By 
fairness of the scheduler, both of these threads will eventually be scheduled. This 
contradicts N being the final node of the path. 

The path carrying the obligation for Smin waits only for signals that are 
waited for finitely often. (Remember that AUG-RED-AwaAIT requires the signal 
waited for to be of a lower level than all held obligations, i.e., a lower level than 
that of Smin-) It is therefore a finite path. A contradiction. 


' For simplicity, our augmented semantics assumes that the level order and the level 
associated with any object remains fixed for the entire execution. However, following 
the approach presented in [18], it would be sound to add a step rule that allows a 
thread to change the level of an object it has exclusive access to (cf. Sect. 2.2). 
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Notice that the above argument relies on the property that every non-empty 
set of levels has a minimal element. For this reason, for termination verification 
we require that Levs is not just partially ordered, but also well-founded. 


Program Logic 

Directly using the augmented semantics to prove that our example program 
terminates is cumbersome. In the following, we present a separation logic that 
simplifies this task. 


Safety. We call a program c safe under a (partial) heap h if it provides all 
the resources necessary such that both c and any threads it forks can execute 
without getting stuck in the augmented semantics. (This depends on the angelic 
choices.) We denote this by safe(h, c) [33]?. 

Consider a program c that is safe under an augmented heap h. Let h be the real 
heap that matches h apart from the ghost resources. Then, for every real execution 
that starts with h we can construct a corresponding augmented execution. 


Specifications. We use Hoare triples { A} c {Ar. B(r)} [8] to specify the behaviour 
of a program c. Such a triple expresses the following: Consider any evaluation con- 
text Æ, such that for every return value v, running E[v] from a state that satisfies 
B(v) is safe. Then, running Efc] from a state that satisfies A is safe. 


Proof System. We define a proof relation F which ensures that whenever 
we can prove F {A} e{)r. B(r)}, then c complies with the specification 
{A} c{Ar. B(r)}. Figure 3b presents some of the proof rules we use to define H. 
As we evolve our setting throughout this section, we also adapt our proof rules. 
Rules that will be changed later are marked with a prime in their name. The 
full set of rules is presented in the extended version of this paper [28]. Our proof 
rules PR-SETSIGNAL’ and PR-AwalIT’ are similar to the rules for sending and 
receiving on a channel presented in [19]. 

Notice how the proof rules enforce the side-conditions of the augmented 
semantics. Hence, all we have to do to prove that a program c terminates is 
to prove that every thread eventually discharges all its obligations. That is, we 
have to prove + {obs(())} c {obs(() }. Figure 3a illustrates how we can apply our 
rules to verify that our minimal example terminates. 


2.2 Non-Thread-Safe Physical Signals 


As a step towards supporting waiting for arbitrary conditions over shared data 
structures, including non-thread-safe ones, we now move to non-thread-safe sig- 
nals. For simplicity, in this paper we consider programs that use mutexes to syn- 
chronize concurrent accesses to shared data structures. (Our ideas apply equally 
to programs that use other constructs, such as atomic machine instructions.) 
Figure 4 presents our updated example. 


? For a formal definition see this paper’s extended version [28] and the technical 
report [29]. 
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{obs(0)} 
let sig := new-signal in PR-NEWSIGNAL’ with L = 0 
{obs({(sig, 0)}) * signal((sig, 0))} s := (sig, 0) 
fork ({obs(@) x signal(s) } 
await is_set(sig) s.lev=0~<. 0 
{obs(Ø) x signal(s)}); 
{obs({s})} 
set_signal(sig) 


{obs(0)} 


(a) Proof outline for program from Fig. 1. Applied proof rule marked in purple. Abbre- 
viation marked in brown. General hint marked in red. 


PR-NEWSIGNAL’ 
L € Levs 


+ {obs(O)} new signal {Ar. obs(O w {(r, L)}) * signal((r, L))} 


PR-SETSIGNAL’ 
+ {obs(O w {s})} set_signal(s.id) {obs(O)} 
PR-FORK’ 
+ {obs(Of) x A} c {obs(0) « B} 
H {obs(Om © Op) * A} fork c {obs(Om)} 


PR-AWAIT’ 
s.lev <1 O 


H {obs(O) « signal(s)} await is_set(s.id) {obs(O) « signal(s) } 


PR-LET 
H {A} c {Ar.C(r)} Vu. H {C(v)} e'[v/x] {B} 
+ {A} let z:=cinc’ {B} 


(b) Proof rules. Rules only used in this section marked with ’. 


Fig. 3. Verifying termination of minimal example with physical thread-safe signal. 
(Color figure online) 


let sig := new_signal in 

let mut := new_mutex in with mut await c := (while acquire mut; 
fork with mut await is_set(sig); let r:=cin 

acquire mut; release mut; 

set_signal(sig); 


ar 
release mut do skip) 
(a) Code. (b) Syntactic sugar. r not free in mut. 


Fig. 4. Minimal example with two threads communicating via a physical non-thread- 
safe signal protected by a mutex. 
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As signal sig is no longer thread-safe, the two threads can no longer use it 
directly to communicate. Instead, we have to synchronize accesses to avoid data 
races. Hence, we protect the signal by a mutex mut created by the main thread. 
In each iteration, the forked thread acquires the mutex, checks whether sig has 
been set and releases it again. After forking, the main thread acquires the mutex, 
sets the signal and releases it again. 


Exposing Signal Values. Signals are specially marked heap cells storing boolean 
values. We make this explicit by extending our signal chunks from signal(s) to 
signal(s,b) where b is the current value of s and by updating our proof rules 
accordingly. Upon creation, signals are unset. Hence, creating a signal sig now 
spawns an unset signal chunk signal((sig, L), False) for some freely chosen level L 
and an obligation for (sig, L), cf. PR-NEWSIGNAL”. We present our new proof 
rules in Fig.6 and demonstrate their application in Fig. 5. 


{obs(0)} 
let sig := new-signal in PR-NEWSIGNAL” with L= 1 
{obs({(sig, 1)}) * signal((sig, 1), False) } PR-VIEWSHIFT & VS-SEMIMP 
{obs({(sig, 1)}) * 3b. signal((sig, 1), b)} s := (sig, 1), P := 3b. signal(s, 6) 
let mut := new_mutex in PR-NEWMUTEX” with L = 0 
{obs({s}) x mutex(m, P)} PR-VIEWSHIFT 
{obs({s}) * mutex(m, P) * mutex(m, P)} & VS-CLONEMutT” 
fork ({obs(@) x mutex(m, P)} 
with m await m.lev, s.lev < @ 
{obs({m}) *« P} PR-EXISTS 
Yb. {obs({m}) * signal(s, b) } 
is_set (sig) 
{Ar. obs({m}) * signal(s, b) A r = b} PR-VIEWSHIFT & VS-SEMIMP 
Ar. obs({mF}) 
{ xif r then P else signal(s, m 
{obs(@) x mutex(m, P)} PR-VIEWSHIFT & VS-SEMIMP 
{obs()) }); 
{obs({s}) x mutex(m, P)} 
acquire mut; m.lev = 0 < 1 = s.lev 
{obs({s, m}) * locked(m, P) * 3b. signal(s, b)} PR-EXISTS 


Vb. {obs({s, m}) » locked(m, P) * signal(s, b)} 
set_signal(sig); 
{obs({m}) » locked(m, P) x signal(s, True)} PR-VIEWSHIFT & VS-SEMIMP 
{obs({{m}) * locked(m, P) « P} 
release mut 
{obs(@) x mutex(m, P)} PR-VIEWSHIFT & VS-SEMIMP 


{obs() } 


Fig. 5. Proof outline for program Fig. 4, verifying termination with mutexes & non- 
thread safe signals. Applied proof and view shift rules marked in purple. Abbreviations 
marked in brown. General hints marked in red. (Color figure online) 
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PR-NEWSIGNAL” 
L € Levs 
+ {obs(O)} new-signal { id. obs(O w {(id, L)}) * signal((id, L), False) } 


PR-SETSIGNAL” 
+ {obs(O w {s}) * signal(s, _)} set_signal(s.id) {obs(O) « signal(s, True) } 


PR-ISSIGNALSET” 
H {signal(s, b)} is_set(s.id) {Ar. signal(s, 6) A r = b} 


PR-AWAIT” 
m.lev, s.lev <_ O signal(s, False) x R > P 
H {obs(O w {m}) x P} c {Ar. obs(O w {m}) * if r then P else signal(s, False) x R} 
+ {obs(O) * mutex(m, P)} with m.loc await c {obs(O) » mutex(m, P) } 


(a) Signals & busy waiting. 


PR-NEWMUTEX” 
L € Levs 


+ {P} new mutex {M. mutex((é, L), P)} 


PR-ACQUIRE” PR-RELEASE” 
{obs(O) *« mutex(m, P) A m.lev <ı O} {obs(O w {m}) * locked(m, P) * P} 
F acquire m.loc F release m.loc 
{obs(O) * mutex(m, P)} 


{obs(O w {m}) * locked(m, P) « P} 


(b) Mutexes. 


PR-FRAME PR-EXISTS 
+ {A} c {B} Va € A. + {a} c {B} 
H{AxF}c{B*F} - {V4} c {B} 
PR-FORK PR-VIEWSHIFT 
H {obs(Of) x A} c {obs(Ø)} ASA’ HIA} c {B’} B'SB 
- {A} c {B} 


H {obs(O,, & Op) * A} fork c {obs(O,,)} 


(c) Standard rules. 


VS-SEMIMP VS-TRANS 
VH. consistentn (H) \ H Fa A> H Fa B ASC CSB 
ASB 


ASB 


VS-CLONEMUT” 
mutex(m, P) > mutex(m, P) x mutex(m, P) 


(d) View shifts. 


Fig. 6. Proof rules and view shift rules for mutexes and non-thread safe signals. Rules 


only used in this section marked with ”. 
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Data Races. As read and write operations on signals are no longer thread-safe, 
our logic has to ensure that two threads never try to access sig at the same time. 
Hence, in our logic possession of a signal chunk signal(s, b) expresses (temporary) 
exclusive ownership of s. Further, our logic requires threads to own any signal 
they are trying to access. Specifically, when a thread wants to set sig, it must hold 
a chunk of the form signal((sig, L), b), cf. PR-SETSIGNAL”. The same holds for 
reading a signal’s value, cf. PR-ISSIGNALSET”. Note that signal chunks are not 
duplicable and only created upon creation of the signal they refer to. Therefore, 
holding a signal chunk for sig indeed guarantees that the holding thread has the 
exclusive right to access sig (while holding the signal chunk). 


Synchronization and Lock Invariants. After the main thread creates sig, it exclu- 
sively owns the signal. The main thread can transfer ownership of this resource 
during forking, cf. PR-FORK’, and thereby allow the forked thread to busy-wait 
for sig. This would, however, leave the main thread without any permission to 
set the signal and thereby discharge its obligation. 

We use mutexes to let multiple threads share ownership of a common set of 
resources in a synchronized fashion. Every mutex is associated with a lock invari- 
ant P, an assertion chosen by the proof author that specifies which resources the 
mutex protects. In our example, we want both threads to share sig. To reflect 
the fact that the signal’s value changes over time, we choose a lock invariant 
that abstracts over its concrete value. We choose P := ib. signal((sig, L), b). 
Let us ignore the chosen signal level L for now. Creating the mutex mut con- 
sumes this lock invariant and binds it to mut by creating a mutex chunk 
mutex((mut,...),.P), cf. PR-NEwWMUTEX”. Thereby, the main thread loses 
access to sig. The only way to regain access is by acquiring mut, cf. PR- 
ACQUIRE”. Once the thread releases mut, it again loses access to all resources 
protected by the mutex, cf. PR-RELEASE” . 


Deadlocks. We have to ensure that any acquired mutex is eventually released, 
again. Hence, acquiring a mutex spawns a release obligation for this mutex 
and the only way to discharge this obligation is indeed by releasing it, cf. PR- 
ACQUIRE” and PR-RELEASE”. 

Any attempt to acquire a mutex will block until the mutex becomes available. 
In order to prove that our program terminates, we have to prove that it does 
not get stuck during an acquisition attempt. To prevent wait cycles involving 
mutexes, we require the proof author to associate every mutex as well (just like 
signals) with a level L. This level can be freely chosen during the mutex’ creation, 
cf. PR-NEWMUTEX”. Mutex chunks therefore have the form mutex((é, L), P) 
where £ is the heap location the mutex is stored at. Their only purpose is to 
record the level and lock invariant a mutex is associated with. Hence, these 
chunks can be freely duplicated as we will see later. Generally, we denote mutex 
tuples by m = (4, L). We only allow to acquire a mutex if its level is lower than 
the level of each held obligation, cf. PR-ACQUIRE” . This also prevents any thread 
from attempting to acquire mutexes twice, e.g., acquire mut; acquire mut or 
with mut await acquire mut. 
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View Shifts. When verifying a program, it can be necessary to reformulate the 
proof state and to draw semantic conclusions. To allow this we introduce a so- 
called view shift relation > [14]. By applying proof rule PR-VIEWSHIFT and VS- 
SEMImP we can strengthen the precondition and weaken the postcondition. In 
our example, we use this to convert the unset signal chunk into the lock invariant 
which abstracts over the signal’s value, i.e., signal(s, False) > Jb. signal(s, b). 

The logic we present in this work is an intuitionistic separation logic that 
allows us to drop chunks. This allows us to simplify the postcondition of our 
fork proof rule’s premise from obs(Q) x B to obs(@), cf. PR-FORK, and drop all 
unneeded chunks via a semantic implication obs() x B > obs(Q). 

We also allow to clone mutex chunks via view shifts, cf. VS-CLONEMUT”. 
In our example, this is necessary to inform both threads which level and lock 
invariant mutex mut is associated with. That is, the main thread clones the 
mutex chunk mutex(m,P) and passes one chunk on when it forks the busy- 
waiting thread. 

In Sect. 2.4 we extend our view shift relation and revisit our interpretation of 
what a view shift expresses. The full set of rules we use to define > is presented 
in the extended version of this paper [28]. 


Busy Waiting. In the approach presented in this paper, for simplicity we only 
support busy-waiting loops of the form with mut await c, which is syntactic 
sugar for while acquire mut; let r:=c in release mut;—r do skip where r 
denotes a fresh variable.* In each iteration, the loop tries to acquire mut, executes 
c, releases mut again and lets the result returned by c determine whether the 
loop continues. Such loops can fail to terminate for two reasons: (i) Acquiring 
mut can get stuck and (ii) the loop could diverge. 

We prevent the loop from getting stuck by requiring mut’s level to be lower 
than the level of each held obligation, cf. PR-AWAIT”. Further, we enforce ter- 
mination by requiring the loop to wait for a signal. That is, when verifying a 
busy-waiting loop using our approach, the proof author must choose a fixed sig- 
nal and prove that this signal remains unset at the end of every non-finishing 
iteration. This way, we can prove that the loop terminates by proving that every 
signal is eventually set, just as in Sect. 2.1. And just as before, our logic requires 
the level of the waited-for signal to be lower than the level of each held obligation. 

Acquiring the mutex in every iteration makes the lock invariant available 
during the verification of the loop body c. This lock invariant has to be restored 
at the end of the iteration such that it can be consumed during the mutex’s 
release. PR-AWAIT” allows for an additional view shift to restore the invariant. 
In our example, we end our busy-waiting loop’s non-finishing iterations with the 
assertion signal(s, False). We use a semantic implication view shift to convert the 
signal chunk into the mutex invariant 3b. signal(s, b). 


3 This allows a thread to drop its obligations chunk obs(O). Note, however, that by 
dropping this chunk the thread does not drop its obligations, but only its ability to 
show what its obligations are. In particular the thread would be unable to present 
an empty obligations chunk upon termination. 

* As we discuss in Sect. 5, in the technical report accompanying this paper we present 
a more general logic that imposes no such syntactic restrictions. 
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Choosing Levels. In our example, we have to assign levels to the mutex mut 
and to the signal sig. Our proof rules for mutex acquisition and busy wait- 
ing impose some restrictions on the levels of the involved mutexes and signals. 
By analysing the corresponding rule applications that occur in our proof, we 
can derive which constraints our level choice must comply with. Our example’s 
verification involves one application of PR-ACQUIRE” and one application of 
PR-AwalIt”: (i) Our main thread tries to acquire mut while holding an obliga- 
tion to set sig. (ii) The forked thread busy-waits for sig while not holding any 
obligations. Our assignment of levels must therefore satisfy the single constraint 
m.lev <, s.lev. So, we choose Levs = {0,1}, m.lev = 0 and s.lev = 1. 


2.3 Arbitrary Data Structures 


The proof rules we introduced in Sect. 2.2 allow us to verify programs busy- 
waiting for arbitrary conditions over arbitrary shared data structures as follows: 
For every condition C the program waits for, the proof author inserts a signal 
s into the program. They ensure that s is set at the same time the program 
establishes C and prove an invariant stating that the signal’s value expresses 
whether C holds. Then, the waiting thread can use s to wait for C. We illustrate 
this here for the simplest case of setting a single heap cell in Fig. 7a. 


let x := cons(0) in let x := cons(0) in 


fork with mut await [x] = 1; let müt -= new- mutex in 


acquire mut; fork with mut await [x] = 1; 
[x] :=1, acquire mut; 
release mut [x] = 1; 


(a) Example program with busy wait- 
ing for heap cell x to be set. release mut 
(b) Example program 7a with addi- 


tional signal sig inserted, marked in 
green . sig and x are kept in sync. 


le] =e’ := (let r:=[e] in r =e’) 


(c) Syntactic sugar. r free in e’. 


Fig. 7. Minimal example illustrating busy waiting for condition over heap cell. (Color 
figure online) 


The program involves three new non-thread-safe commands: (i) cons(v) for 
allocating a new heap cell and initializing it with value v, (ii) [¢] := v for assigning 
value v to heap location 4, (iii) [£] for reading the value stored in heap location £. 
We use [4] = v as syntactic sugar for let r:=[e] in r = e’. 

In our example, the main thread allocates x, initializes it with the value 0 and 
protects it using mutex mut. It forks a new thread busy-waiting for x to be set. 
Afterwards, the main thread sets x. As explained above, we verify the program by 
inserting a signal sig that reflects whether x has been set, yet. Figure 7b presents 
the resulting code. The main thread creates the signal and sets it when it sets x. 
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{obs(0)} 

let x := cons(0) in 

{obs() x x + 0} 

let sig := new-signal in PR-NEWSIGNAL” with L= 1 
let mut := new_mutex in PR-NEWMUTEX” with L = 0 
s := (sig, 1), m= (mut, 0) 

P := Av. x v x signal(s,v = 1) 

{obs({s}) x mutex(m, P) x mutex(m, P)} 

fork ({obs(@) x mutex(m, P)} 


with m await m.lev, s.lev <, Ø 
{obs({m}) « P} 
Vu. {obs({m}) «x v x signal(s,v = 1)} 
>] =1 
Ar. obs({m}) 
x if r then P 
else x> v Av Æ 1x signal(s, False) 
{obs(0)}); 
{obs({s}) x mutex(m, P)} 
acquire mut; m.lev = 0 < 1 = s.lev 
Yv. {obs({s, m}) x locked(m, P) * x +> v * signal(s, v = 1)} 
[x] :=1; 


{obs({s, m}) x locked(m, P) «x ++ 1 x signal(s,v = 1)} 

set_signal(sig); 

{obs({m}) * locked(m, P) «x ++ 1 x signal(s, True) } 

release mut 

{obs(0) } 
(a) Proof outline for program 7b. Applied proof rules marked in purple. Abbreviations 
marked in brown. General hints marked in red. 


PR-Cons PR-ASSIGNTOHEAP 
+ {True} cons(v) {AZ 2+ v} H {L= _} [Q:=v {£4 v} 
PR-Exp 


PR-READHEAPLOC”’ 


F { iii v} [4 {àr. r=v* lm v} [e] € Values 


+ {True} e {Ar.r = [e]} 


(b) Proof rules. Evaluation function [-]. Rules only used in this section marked with ”’. 


Fig. 8. Verifying termination of busy waiting for condition over heap cell. (Color figure 
online) 


Heap Cells. Verifying this example does not conceptually differ from the example 
we presented in Sect. 2.2. Figure 8b presents the new proof rules we need and Fig. 8a 
sketches our example’s verification. As with non-thread-safe signals, we have to 
prevent multiple threads from trying to access x at the same time in order to pre- 
vent data races. For this we use so-called points-to chunks [24,31]. They have the 
form £ + v and express that heap location £ stores the value v. When a thread 
holds such a chunk, it exclusively owns the right to access heap location £. 

Heap locations are unique and the only way to create a new points-to chunk 
is to allocate and initialize a new heap cell via cons(v), cf. PR-Cons. Hence, 
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there will never be two points-to chunks involving the same heap location. In 
order to read or write a heap cell via [£] or [4] := e, the acting thread must first 
acquire possession of the corresponding points-to chunk, cf. PR-ASSIGNTOHEAP 
and PR-READHEAPLOC”’. 


Relating Signals to Conditions. In our example, the forked thread busy-waits for 
x to be set while our proof rules require us to justify each iteration by showing 
an unset signal. That is, we must prove an invariant stating that the value of x 
matches sig. As this invariant must be shared between both threads, we encode 
it in the lock invariant: P := dv. x + v x signal(s,v = 1). This does not only 
allow both threads to share the heap cell and the signal but it also automatically 
enforces that they maintain the invariant whenever they acquire and release the 
mutex. 


2.4 Signal Erasure 


In the program from Fig. 7b signal sig is never read and does hence not influence 
the waiting thread’s runtime behaviour. Therefore, we can verify the original 
program presented in Fig. 7a by erasing the physical signal and treating it as 
ghost code. 


Ghost Signals. Central aspects of the proof sketch we presented in Fig. 8a are 
that (i) the main thread was obliged to set sig and that (ii) the value of sig 
reflected whether x was already set. Ghost signals allow us to keep this infor- 
mation but at the same to remove the physical signals from the code. Ghost 
signals are essentially identical to the physical non-thread-safe signals we used 
so far. However, as ghost resources they cannot influence the program’s runtime 
behaviour. They merely carry information we can use during the verification 
process. 


View Shifts Revisited. We implement ghost signals by extending our view shift 
relation. In particular, we introduce two new view shift rules: VS-NEWSIGNAL 
and VS-SETSIGNAL presented in Fig. 9b. The former creates a new unset signal 
and simultaneously spawns an obligation to set it. The latter can be used to set 
a signal and thereby discharge a corresponding obligation. We say that these 
rules change the ghost state and therefore call their application a ghost proof 
step. With this extension, a view shift A =} B expresses that we can reach 
postcondition B from precondition A by (i) drawing semantic conclusions or by 
(ii) manipulating the ghost state. In Fig.9a we use ghost signals to verify the 
program from Fig. 7a. 

Note that lifting signals to the verification level does not affect the soundness 
of our approach. The argument we presented in Sect. 2.1 still holds. We formalize 
our logic and provide a formal soundness proof in the extended version of this 
paper [28] and in the technical report [29]. The latter contains a more general 
version of the presented logic that (i) is not restricted to busy-waiting loops of 
the form with mut await c and that (ii) is easier to integrate into existing tools 
like VeriFast [12], as explained in Sect. 5. 
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{obs(@) } 
let x := cons(0) in 
{obs(0) x x — 0} 


new-ghost-signal; VS-NEWSIGNAL with L = 1. 
{Asig. obs({(sig, 1)}) «x 0 * signal((sig, 1), False)} s := (sig, 1) 
Vsig. {obs({s}) * x > 0 * signal(s, False) } P := dv. x v «signal(s,v = 1) 
let mut := new_mutex in PR-NEWMUTEXx” with L = 0 
obs({s}) x mutex((mut, 0), P) = 
‘i mutex((mut, 0), P) \ osmu A) 
fork ({obs(@) x mutex(m, P)} 
with m await m.lev, s.lev <, Ø 
{obs({m}) * P} 
Vu. {obs({m}) «x v x signal(s,v = 1)} 
[x] =1 
Ar. obs({m}) * 
if r then P 
else x > uA v #1 * signal(s, False) 
{obs(0)}); 
{obs({s}) x mutex(m, P)} 
acquire mut; m.lev = 0 < 1 = s.lev 


obs({s, m}) * locked(m, P) 
vv. : 
*x > v * signal(s, v = 1) 
[x] :=1; 
set_ghost_signal(s); 
obs({m}) * locked(m, P) 
xx > 1 * signal(s, True) 
release mut 


{obs()} 


(a) Proof outline for the program presented in Fig. 7a. Auxiliary commands hinting at 
view shifts and general hints marked in red. Applied proof and view shift rules marked 
in purple. Abbreviations marked in brown. 


VS-NEWSIGNAL 
L € Levs 


obs(O) > Jid. obs(O w {(id, L)}) * signal((id, L), False) 


VS-SETSIGNAL 
obs(O w {s}) * signal(s, _) > obs(O) x signal(s, True) 


(b) Proof rules. 


Fig. 9. Verifying termination with ghost signals. (Color figure online) 


3 A Realistic Example 


To demonstrate the expressiveness of the presented verification approach, we 
verified the termination of the program presented in Fig. 10a. It involves two 
threads, a consumer and a producer, communicating via a shared bounded FIFO 
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with a maximal capacity of 10. The producer enqueues numbers 100, ..., 1 into 
the FIFO and the consumer dequeues those. Whenever the queue is full, the 
producer busy-waits for the consumer to dequeue an element. Likewise, whenever 
the queue is empty, the consumer busy-waits for the producer to enqueue the 
next element. Each thread’s finishing depends on the other thread’s productivity. 
This is, however, no cyclic dependency. For instance, in order to prove that the 
producer eventually pushes number i into the queue, we only need to rely on the 
consumer to pop i+ 10. A similar property holds for the consumer. 


alloc_ghost_signal_ IDs(idjop, i Us) for 1 o< 100; 

pie := 102—4%, Lipus = 101 — å, sy i= (id), Li) for 1<i< 100 
init_ghost_signals(s}00, al 

{obs({s}25, 520%, }) «.-} 

let fifoio := cons(nil) in let mut := new mutex in 

let cp := cons(100) in let c. := cons(100) in 


fork (while ( Cp decreases in each iteration. 
with mut await ( Busy-wait for fifoig not being full. 
{otsene (mut, 0)}) *...} — Wait for consumer to pop. 
let f := [fifo1o] in 
if size(f) < 10 then ( If fifoio not full, push next element. 
let c := [cp] in [fifoio] :=f -(c); [cp] :=¢ — 1; 


set_ghost_signal(s Suhi 
if c — 1 Æ 0 then init_ghost_signal(s<,,5,)); 


size(f) Æ 10); if size(f) = 10 then wait for stb ° 
[cp] 4 0) [gp =92-q@<101-q =Lea 
do skip); 
while ( ce decreases in each iteration. 
with mut await ( Busy-wait for fifo1o not being empty. 
{obs({sfop, (mut, 0)}) *...} — Wait for producer to push. 
let f := [fifoio] in 
if size(f) > 0 then ( If fifoio not empty, pop next element. 


let c := [ce] in [fifoio]:=tail(f); [cc] :=c — 1; 
set_ghost_signal(shop); 
if c— 140 then init_ghost-signal(s¢>3)); 


size(f) > 0); if size(f) = 0 then wait for sash 
[ce] # 0) LEen = 101 — ce < 102 — ce = LEen 
do skip); 


(a) Example program with two threads communicating via a shared bounded FIFO 
with maximal size 10. Auxiliary commands hinting at view shifts and general hints 
marked in red. Abbreviations marked in brown. Hints on proof state marked in blue. 


VS-ALLOCSIGID VS-SIGINIT 


3 eyes obs(O) * uninitSig(id) 
Trug mata uninitSig id) > obs(O w {(id, L)}) * signal((id, L), False) 


(b) Fine-grained view shift rules for signal creation. 


Fig. 10. Realistic example program. (Color figure online) 
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Fine- Tuning Signal Creation. To simplify complex proofs involving many signals 
we refine the process of creating a new ghost signal. For simplicity, we combined 
the allocation of a new signal ID and its association with a level and a boolean 
in one step. For some proofs, such as the one we outline in this section, it 
can be helpful to fix the IDs of all signals that will be created throughout the 
proof already at the beginning. To realize this, we replace view shift rule VS- 
NEWSIGNAL by the rules presented in Fig.10b and adapt our signal chunks 
accordingly. With these more fine-grained view shifts, we start by allocating 
a signal ID, cf. VS-ALLOCSIGID. Thereby we obtain an uninitialized signal 
uninitSig(id) that is not associated with any level or boolean, yet. Also, allocating 
a signal ID does not create any obligation because threads can only wait for 
initialized (and unset) signals. When we initialize a signal, we bind its already 
allocated ID to a level of our choice and associate the signal with False, cf. VS- 
SicInit. This creates an obligation to set the signal. 


Loops and Signals. In our program, both threads have a local counter initially 
set to 100 and run a nested loop. The outer loops are controlled by their thread’s 
counter, which is decreased in each iteration until it reaches 0 and the loop stops. 
For such loops, we introduce a conventional proof rule for total correctness of 
loops, cf. this paper’s extended version [28]. Verifying termination of the inner 
loops is a bit more tricky and requires the use of ghost signals. 

So far, we had to fix a single signal for the verification of every await loop. 
We can relax this restriction to considering a finite set of signals the loop may 
wait for, cf. PR-AWAIT presented in [28]. Apart from being a generalisation, this 
rule does not differ from PR-AWAIT” introduced in Sect. 2.2. 

Initially, we allocate 200 signal IDs idish nh dic idnops see, soi We are 
going to ensure that always at most one push signal and at most one pop signal 
are initialized and unset. The producer and consumer are going to hold the 
obligation for the push and pop signal, respectively. The producer will hold the 
obligation for Shush while ¿ is the next number to be pushed into the FIFO and 
it will set S ush when it pushes the number 7 into the FIFO. Meanwhile, the 
consumer will use S ish to wait for the number 7 to arrive in the queue when it 
is empty. Similarly, the consumer will hold the obligation for Shop 
i is the next number to be popped from the FIFO and will set st 


while number 
pop When it pops 
the number 7. The producer uses Spop to wait for the consumer to pop i from 
the queue when it is full. At any time, we let the mutex mut protect the two 


active signals and thereby make them accessible to both threads. 


Choosing the Levels. Note that we ignored the levels so far. The producer and the 
consumer both acquire the mutex while holding an obligation for a signal. Hence, 
we choose Levs = N, m.lev = 0 and s.lev > 0 for every signal s. Both threads will 
justify iterations of their respective await loop by using an unset signal at the 
end of such an iteration. Our proof rules allow us to ignore the mutex obligation 
during this step. Hence, the mutex level does not interfere with the level of the 
unset signal. Whenever the queue is full, the producer waits for the consumer 
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to pop an element and whenever the queue is empty, the consumer waits for 
the producer to push. That is, the producer waits for a while holding an 
obligation for S$ usn and the consumer waits for sisp while holding an obligation 


for spop- So, we have to choose the signal levels such that Sree lev < a8 oy ley 
and an lev < s$op-lev hold. We solve this by choosing s‘,,,.lev = 102 — i and 
Spush-lev = 101 — 4. 


Verifying Termination. This setup suffices to verify the example program. Via 
the lock invariant, each thread has access to both active signals. Whenever the 
producer pushes a number i into the queue, it sets sfusn which discharges the held 
obligation and decreases its counter. prcrwarcs, if i > 1, it uses the uninitialized 


signal chunk uninitSig(¢ idinan) to initialize Sah = = (i id, 101 — (i — 1)) and 


replaces S ush in the lock invariant by Sai before it releases the lock. If i = 1, 
the counter reached 0 and the loop ends. In this case, the producer holds no 
obligation. The consumer behaves similarly. Since we proved that each thread 
discharged all its obligations, we proved that the program terminates. Figure 10a 
illustrates the most important proof steps. We present the program’s verification 
in full detail in the extended version of this paper [28] and in the technical 
report [29]. Furthermore, we encoded [27] the proof in VeriFast [12]. 

The number of threads in this program is fixed. However, our approach also 
supports the verification of programs where the number of threads is not even 
statically bounded. In [28] we present and verify such a program. It involves N 
producer and N consumer threads that communicate via a shared buffer of size 
1, for a random number N > 0 determined during runtime. 


4 Specifying Busy-Waiting Concurrent Objects 


Our approach can be used to verify busy-waiting concurrent objects with respect 
to abstract specifications. For example, we have verified [26] the CLH lock [7] 
against a specification that is very similar to our proof rules for built-in mutexes 
shown in Fig. 6. The main difference is that it is slightly more abstract: when a 
lock is initialized, it is associated with a bounded infinite set of levels rather than 
with a single particular level. (To make this possible, an appropriate universe 
of levels should be used, such as the set of lists of natural numbers, ordered 
lexicographically.) To acquire a lock, the levels of the obligations held by the 
thread must be above the elements of the set; the new obligation’s level is an 
element of the set. 


5 Tool Support 


We have extended the VeriFast tool [10] for separation logic-based modular ver- 
ification of C and Java programs so that it supports verifying termination of 
busy-waiting C or Java programs. When verifying termination, VeriFast con- 
sumes a call permission at each recursive call or loop iteration. In the technical 
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report [29] we define a generalised version of our logic that instead of providing 
a special proof rule for busy-waiting loops, provides wait permissions and a wait 
view shift. A call permission of a degree 6 can be turned into a wait permission 
of a degree 6’ < 6 for a given signal s. A wait view shift for an unset signal s for 
which a wait permission of degree 6 exists produces a call permission of degree 
ô, which can be used to fuel a busy-waiting loop. When busy-waiting for some 
signal s, we can generate new permissions to justify each iteration as long as 
s remains unset. 

VeriFast allows threads to freely exchange permissions. This is useful to verify 
termination of non-blocking algorithms involving compare-and-swap loops [11]. 
However, we must be careful to prevent self-fueling busy-waiting loops. Hence, 
we restrict where a permission can be consumed based on the thread phase it 
was created in. The main thread’s initial phase is e. When a thread in phase p 
forks a new thread, its phase changes to p.Forker and the new thread starts in 
phase p.Forkee. We allow a thread in phase p to consume a permission only if it 
was produced in an ancestor thread phase p' E p. 

The only change we had to make to VeriFast’s symbolic execution engine 
was to enforce the thread phase rule. We encoded the other aspects of the logic 
simply as axioms in a trusted header file. We used this tool support to verify 
the bounded FIFO (Sect. 3) and the CLH lock (Sect. 4). The bounded FIFO 
proof [27] contains 160 lines of proof annotations for 37 lines of code (an anno- 
tation overhead of 435%) and takes 0.08s to verify. The CLH lock proof [26] 
contains 343 lines of annotations for 49 lines of code (an overhead of 700%) and 
takes 0.1s to verify. 


6 Integrating Higher-Order Features 


The logic we presented in this paper does not support higher-order features such 
as assertions that quantify over assertions, or storing assertions in the (logical) 
heap as the values of ghost cells. While we did not need such features to carry 
out our example proofs, they are generally useful to verify higher-order program 
modules against abstract specifications. The typical way to support such features 
in a program logic is by applying step indexing [1,17], where the domain of logical 
heaps is indexed by the number of execution steps left in the (partial) program 
trace under consideration. Assertions stored in a logical heap at index n+1 talk 
about logical heaps at index n; i.e., they are meaningful only later, after at least 
one more execution step has been performed. 

It follows that such logics apply directly only to partial correctness prop- 
erties. Fortunately, we can reduce a termination property to a safety property 
by writing our program in a programming language instrumented with runtime 
checks that guarantee termination. Specifically, we can write our program in a 
programming language that fulfils the following criteria: It tracks signals, obliga- 
tions and permissions at runtime and has constructs for signal creation, waiting 
and setting a signal. The fork command takes as an extra operand the list of 
obligations to be transferred to the new thread (and the other constructs simi- 
larly take sufficient operands to eliminate any need for angelic choice). Threads 
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get stuck when these constructs’ preconditions are not satisfied, such as when a 
thread waits for a signal while holding the obligation for that signal. We can then 
use a step-indexing-based higher-order logic such as Iris [14] to verify that no 
thread in our program ever gets stuck. Once we established this, we know none 
of the instrumentation has any effect and can be safely erased from the program. 


7 Related and Future Work 


In recent work [30] we propose a separation logic to verify termination of pro- 
grams where threads busy-wait to be abruptly terminated. We generalize this 
work to support busy waiting for arbitrary conditions. 

In [11] we propose an approach based on call permissions to verify termi- 
nation of single- and multithreaded programs that involve loops and recursion. 
However, that work does not consider busy-waiting loops. In the technical report, 
we present a generalised logic that uses call permissions and allows busy waiting 
to be implemented using arbitrary looping and/or recursion. Furthermore, the 
use of call permissions allowed us to encode our case studies in our VeriFast tool 
which also uses call permissions for termination verification. 

Liang and Feng [20,21] propose LiLi, a separation logic to verify liveness of 
blocking constructs implemented via busy waiting. In contrast to our verification 
approach, theirs is based on the idea of contextual refinement. In their approach, 
client code involving calls of blocking methods of the concurrent object is verified 
by first applying the contextual refinement result to replace these calls by code 
involving primitive blocking operations and then verifying the resulting client 
code using some other approach. In contrast, specifications in our approach are 
regular Hoare-style triples and proofs are regular Hoare-style proofs. 

In [9] we propose a Hoare logic to verify liveness properties of the I/O 
behaviour of programs that do not perform busy waiting. By combining that 
approach with the one we proposed in this paper, we expect to be able to verify 
I/O liveness of realistic concurrent programs involving both I/O and busy wait- 
ing, such as a server where one thread receives requests and enqueues them into 
a bounded FIFO, and another one dequeues them and responds. To support this 
claim, we encoded the combined logic in VeriFast and verified a simple server 
application where the receiver and responder thread communicate via a shared 
buffer [25]. 


8 Conclusion 


We propose what is to the best of our knowledge the first separation logic for 
verifying termination of programs with busy waiting. We offer a soundness proof 
of the system of the paper in its extended version [28], and of a more general 
system in the technical report [29]. Further, we demonstrated its usability by 
verifying a realistic example. We encoded our logic and the realistic example in 
VeriFast [27] and used this encoding also to verify the CLH lock [26]. Moreover, 
we expect that our approach can be integrated into other existing concurrent 
separation logics such as Iris [14]. 
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Abstract. This paper shows how techniques for linear dynamical sys- 
tems can be used to reason about the behavior of general loops. We 
present two main results. First, we show that every loop that can be 
expressed as a transition formula in linear integer arithmetic has a best 
model as a deterministic affine transition system. Second, we show that 
for any linear dynamical system f with integer eigenvalues and any inte- 
ger arithmetic formula G, there is a linear integer arithmetic formula 
that holds exactly for the states of f for which G is eventually invari- 
ant. Combining the two, we develop a monotone conditional termination 
analysis for general loops. 
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1 Introduction 


Linear and affine dynamical systems are a model of computation that is easy to 
analyze (relative to non-linear systems), making them useful across a broad array 
of applications. In the context of program analysis, affine dynamical systems 
correspond to loops of the form 


while (G(x)) do x := Ax +b (t) 
where G is a formula, A is a matrix, x is a vector of program variables, and b is 
a constant vector. The termination problem for such loops has been shown to be 
decidable for several variations of this model [4,9, 12,24,29]. However, few loops 
in real programs take this form, and so this work has not yet made an impact on 
practical termination analysis tools. This paper bridges the gap between theory 
and practice, showing how techniques for linear and affine dynamical systems 
can be used to reason about general programs. 


Example 1. We illustrate our methodology using the example program in Fig. 1 
(left). First, observe that although the body of this loop is not of the form (+), the 
value of the sum x + y decreases by z each iteration, and z remains the same. 
Thus, we can approximate the loop by the linear dynamical system in Fig. 1 
(right), where the nature of the approximation is given by the linear map in the 
center of Fig. 1 (i.e., the a coordinate corresponds to x + y, and the b coordinate 
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w 
l z:=1 a 0110] |z 

2 while (x > 0 A y > 0) do Teea y 

3 w := 3w+ r+ 1 z 

ce a E ee ee 5 [i] = hea K 
5 Lis xU“- Zz 

6 else: 

7 y :=y-z 


Fig. 1. Over-approximation of a loop by a linear dynamical system. 


to z). The linear map is a simulation, in the sense that it transforms the state 
space of the program into the state space of the linear dynamical system so that 
every step in the loop has a corresponding step in the linear dynamical system. 

Next, we compute the image of the guard of the loop (x > 0^Ay > 0) 
under the simulation, which yields a > 0 (corresponding to the constraint x + 
y > 0 over the original program variables). We can compute a closed form for 
this constraint holding on the kth iteration of the loop by exponentiating the 
dynamics matrix of the linear dynamical system, multiplying on the left by the 
row vector corresponding to the constraint, and on the right by the simulation: 


i o W 
Eo) [o1] [0001 
Constraint ~ ~ —— 


Dynamics Simulation 


= (x + y) — kz. 


xe RE 


We then analyze the asymptotic behavior of the closed form: 


—oo ifz>0 
As k > œ, (x +y)—kz—= į r+y ifz=0 
oo ifz<0 


We conclude that z > 0 V (x +y) < 0 is a sufficient condition for the loop to 
terminate. 4 


The paper is organized as follows. To serve as the class of “linear models” 
of loops, we introduce deterministic affine transition systems (DATS), a com- 
putational model that generalizes affine dynamical systems. Sect.3 shows that 
any loop expressed as a linear integer arithmetic formula has a DATS-reflection, 
which is a best representation of the behavior of the loop as a DATS. Moreover, 
this holds for a restricted class of DATS with rational eigenvalues. Section 4 
shows that for a linear map f with integer eigenvalues and a linear integer arith- 
metic formula G, there is a linear integer arithmetic formula that holds exactly 
for those states x such that G(f*(x)) holds for all but finitely many k € N. 
Section 5 brings the results together, showing that the analysis of a DATS with 
rational eigenvalues can be reduced to the analysis of a linear dynamical system 
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with integer eigenvalues. The fact that DATS-reflections are best implies mono- 
tonicity of the analysis. Finally, in Sect.6, we demonstrate experimentally that 
the analysis can be successfully applied to general programs, using the framework 
of algebraic termination analysis [34] to lift our loop analysis to a whole-program 
conditional termination analysis. Some proofs are omitted for space, but may be 
found in the extended version of this paper [33]. 


2 Preliminaries 


This paper assumes familiarity with linear algebra — see for example [19]. We 
recall some basic definitions below. 

In the following, a linear space refers to a finite-dimensional linear space 
over the field of rational numbers Q. For V a linear space and U C V, span(U) 
is the linear space generated by U; i.e., the smallest linear subspace of V that 
contains U. An affine subspace of a linear space V is the image of a linear 
subspace of V under a translation (i.e., a set of the form {v + vo : v € U} for 
some linear subspace U C V and some vo € V). For any scalar a € Q, and any 
linear space V, we use a to denote the linear map a: V — V that maps v > av 
(in particular, 1 is the identity). A linear functional on a linear space V is a 
linear map V — Q; the set of all linear functionals on V forms a linear space 
called the dual space of V, denoted V*. A linear map f : Vi; > V2 induces a 
dual linear map f* : V3 — Vř where f*(g) = go f. For any linear space V, V is 
naturally isomorphic to V**, where the isomorphism maps «+> Af : V*. f(z). 

Let V be a linear space. A linear map f : V — V is associated with a 
characteristic polynomial p(x), which is defined to be the determinant of 
(xI — Ay), where Ap is a matrix representation of f with respect to some basis 
(the choice of which is irrelevant). Define the spectrum (set of eigenvalues) 
of f to be the set of (possibly complex) roots of its characteristic polynomial, 
spec(f) = {A € C: ps(A) = 0}. We say that f has rational spectrum if 
spec( f) C Q; equivalently (by the spectral theorem — see e.g. [19, Ch. 6, Theorem 


7): 


— There is a basis {x1,...,2,} for V consisting of generalized (right) eigenvec- 
tors, satisfying (f — ài)" (x) = 0 for some A; € spec(f) and some r; > 1 (r; 
is called the rank of xi) 

— There is a basis {g1,..., gn} for V* consisting of generalized left eigenvectors, 
satisfying g; o (f — Ai)" = 0 for some A; € spec(f) and some r; > 1 

It is possible to determine whether a linear map has rational spectrum (and com- 

pute the basis of eigenvectors for V and V*) in polynomial time by computing 

its characteristic polynomial [15], factoring it [22], and checking whether each 

factor is linear. 
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The syntax of linear integer arithmetic (LIA) is given as follows: 


x € Variable 
nEZ 
tE Term::= z |n]|n-t]|ti + t2 
F € Formula::= tı < t2 | (n| Y| EAR | FLV F | >F | 3x.F | Yx.F 


Let X C Variable be a set of variables. A valuation over X is a map v : X > 
Z. If F is a formula whose free variables range over X and v is a valuation over 
X, then we say that v satisfies F (written v — F) if the formula F is true when 
interpreted over the standard model of the integers, using v to interpret the free 
variables. We write F |— G if every valuation that satisfies F also satisfies G. 


2.1 Transition Systems 


A transition system T is a pair T = (Sr, Rr) where Sr is a set of states and 
Rr C Sr x Sr is a transition relation. Within this paper, we shall assume that 
the state space of any transition system is a finite-dimensional linear space (over 
Q). We write x >r 2’ to denote that the pair (x, x’) belongs to Rr. We define 
the domain of a transition system T, dom(T) £ {x € Sr : Ix'.£x >r x'}, to be 
the set of states that have a T-successor. We define the w-domain dom“ (T) of 
T to be the set of states from which there exist infinite T-computations: 


dom” (T) £ {zo € Sr : Jz1, £2, ... such that £o >r z1 >r t2 >r}. 


A transition formula F(X, X’) is an LIA formula whose free variables 
range over a designated finite set of variables X and a set of “primed copies” 
X' = {x' : x € X}. For example, a transition formula that represents the body 
of the loop in Fig. 1 is 


cr>O0Ay>0Aw' =3w+r+1Az?' =z 


((2|2—y) Aa’ =z-zAy =) (1) 
E TO S ee 


We use TF to denote the set of transition formulas. A transition formula 
F(X, X’) defines a transition system where the state space is the set of functions 
X — Q, and where v >, v’ if and only if both (1) v and v’ map each x € X 
to an integer and (2) [v,v’] | F, where [v, v’] denotes the valuation that maps 
each xz € X to v(x) and each 2’ E€ S’ to v'(x). Defining the state space of F to 
be X — Q rather than X — Z is a technical convenience (X —> Q = Qi*! isa 
linear space), but does not materially affect the results of this paper since only 
(integral) valuations are involved in transitions. 
Let T = (Sr, Rr) be a transition system. We say that T is: 


— linear if Rr is a linear subspace of Sr x Sr, 
— affine if Rr is an affine subspace of Sr x Sr, 
— deterministic if z >r x) and z >r xh implies 2, = 24 
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— total if for all x € Sr there exists some x’ € Sp with z >r x’ 


For example, the transition system T with transition relation 


icy ia 10 a 21 E 0 
rr (Lid Led): ea fol = fo d+ 
ya he oo} Y oij 4s fa 
is deterministic and affine, but not linear or total. The transition system U with 
transition relation 


va fF)) enfi]-aaf]} 


is total, linear (and affine), but not deterministic. The classical notion of a linear 
dynamical system—a transition system where the state evolves according to 
a linear map—corresponds to a total, deterministic, linear transition system. 
Similarly, an affine dynamical system is a transition system that is total, 
deterministic, and affine. 

For any map s : X — Y, and any relation R C X x X, define the image 
of R under s to be the relation s[R] = {(s(x),s(a’)) : (x,x’) € R}. For any 
relation R C Y x Y, define the inverse image of R under s to be the relation 
sTI[R] = {(x,2’) : (s(x), s(x')) € R}. Let T = (Sr, Rr) and U = (Sy, Ry) be 
transition systems. We say that a linear map s : Sp — Sy is a linear simulation 
from T to U, and write s : T — U, if for all z >r 2’, we have s(x) >y s(x’). 
Observe that the following are equivalent: (1) s is a simulation, (2) s[Rr] C Ru, 
and (3) Rr © s'[Ry]. 

An example of a simulation between a transition formula and a linear dynam- 
ical system is given in Fig. 1. In fact, there are many linear dynamical systems 
that over-approximate this loop; however, the simulation and linear dynamical 
system given in Fig. 1 is its best abstraction. 

To formalize the meaning of best abstractions, it is convenient to use the 
language of category theory [17]. Any class of transition systems defines a cat- 
egory, where the objects are transitions systems of that class, and the arrows 
are linear simulations between them. We use boldface letters (Linear, Affine, 
Deterministic, Total) to denote categories of transition systems (e.g., DATS 
denotes the category of Deterministic Affine Transition Systems). 

If T is a transition system and C is a category of transition systems, a C- 
abstraction of T is a pair (U, s) consisting of a transition system U belonging to 
C and a linear simulation s : T — U. A C-reflection of T is a C-abstraction that 
satisfies a universal property among C-abstractions of T: for any C-abstraction 
(V,t) of T there exists a unique simulation t: U — V such that tos = t; i.e., 
the following diagram commutes: 
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If D is a category of transition systems and C is a subcategory such that 
every transition system in D has a C-reflection, we say that C is a reflective 
subcategory of D. 

Our ultimate goal is to bring techniques from linear dynamical systems to 
bear on transition formulas. Fig. 1 gives an example of a program and its linear 
dynamical system reflection. Unfortunately, such reflections do not exist for all 
transition formulas, which motivates our investigation of alternative models. 


Proposition 1. The transition formula xc! = x ^x = 0 has no TDATS- 
reflection. 


Proof. Let F be the 1-dimensional transition formula z’ = «Aa = 0. For a 
contradiction, suppose that (A, s) isa TDATS-reflection of F’. Since F contains 
the origin, then so must the transition relation of A, and so A is linear. Next, 
consider that for any A € Q, we have the simulation id: F — A), where id is 
the identity function and A, = (Q, x + Az). Since (A, s} is a reflection of F’, for 
any A, there is some t, such that tà : A — A) and id = ty o s. Since ty) is a 
simulation, we have At, = A, ot, = t,o A. Since id = t) o s, we must have ty 
non-zero, and so ty is a left eigenvector of A with eigenvalue A. Since this holds 
for all A, A must have infinitely many eigenvalues, a contradiction. 


3 Linear Abstractions of Transition Formulas 


Proposition 1 shows that not every transition formula has a total deterministic 
affine reflection. In the following we show that totality is the only barrier: every 
transition formula has a (computable) DATS-reflection. Moreover, we show that 
every transition formula has a rational spectrum DATS (Q-DATS)-reflection, 
a restricted class of DATS that generalizes affine maps x + Ax + b where A 
has rational eigenvalues. The restriction on eigenvalues makes it easier to reason 
about the termination behavior of Q-DATS. 

In the remainder of this section, we show that every transition formula has 
a Q-DATS-reflection by establishing a chain of reflective subcategories: 


Corollary 1 
qyp gag ES ae ATS 


The fact that Q-DATS is a reflective subcategory of TF then follows from 
the fact that a reflective subcategory of a reflective subcategory is reflective. 


3.1 Affine Abstractions of Transition Formulas 


Let F(X, X’) be a transition formula. The affine hull of F, denoted aff(F), is 
the smallest affine set af(F) C (X U X’) = Q = (X = Q) x (X = Q) that 
contains all of the models of F. Reps et al. give an algorithm that can be used 
to compute aff(F), by using an SMT solver to sample a set of generators [26]. 
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Lemma 1. Let F(X, X’) be a transition formula. The affine hull of F (con- 
sidered as a transition system) is the best affine abstraction of F (where the 
simulation from F to aff(F) is the identity). 


Example 2. Consider the example program in Fig. 1. Letting F denote the tran- 
sition formula corresponding to the program, aff(F) can be represented as the 
solutions to the constraints 


1 
1000 ? 310 0 is i 
0110| |7 | =l011-1 + (2) 
y y 0 
0001] |%, 000 1 
z z 0 


Notice that aff(F) is 4-dimensional and has a transition relation defined by 
3 constraints, and thus is not deterministic. The next step is to find a suitable 
projection onto a lower-dimensional space so that the resulting transition system 
is deterministic. 


3.2 Reflections via the Dual Space 


This section presents a key technical tool that will be used in the next two 
subsections to prove the existence of reflections. For any transition system T, 
an abstraction (U, s} of T consisting of a transition system U and a simulation 
s : Sp — Sy induces a subspace of SA, which is the range of the dual map s* 
(ie., the set of all linear functionals on Sr of the form gos where g € Sj). 
The essential idea is we can apply this in reverse: any subspace A of S% induces 
a transition system U and a simulation s : T — U that satisfies a universal 
property among all abstractions (V,v) of T where the range of v* is contained 
in A. We will now formalize this idea. 

Let T be a transition system, and let A be a subspace of $4. Define a,(T) 
to be the pair a4(T) £ (U,s) consisting of a transition system U and a linear 
simulation s : T — U where 


— s : Sr — A* sends each z € Sr to Af : A. f(x) 
— Sy Ê 4*, and Ry £ s|Rr] = { (s(x), s(x") : (x, 2") € Rr} 


Lemma 2 (Dual space simulation). Let T be a transition system, let A be 
a subspace of S>, and let (U,s) = a,(T). Suppose that Z is a transition system 
and z:T — Z is a simulation such that the range of z* is contained in A. Then 
there exists a unique simulation Z : U — Z such that Zo s= z. 


Proof. The high-level intuition is that since the range of z* is contained in A, 
we may consider it to be a map z* : S$ — A; dualizing again, we get a map 
z** : A* — S3*, whose domain is Sy and codomain is (isomorphic to) Sz. 

More formally, let j : Sz — S% be the natural isomorphism between Sz and 
S% defined by j(y) = Ag: Sk.g(y). Define Z : A* > Sz by 


Zh) = 4-* (Ag : SE.A(g 0 z)). 
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First we show that Zo s = z. Let x € Sz. Then we have 


(Zo s)(x) = Z(s(x)) 
= 9- (Ag: S4.(s(£))(g o 2)) 


=j 
= j7 (Ag : SZAJS : A.f (2))(g 0 2)) 
= j7 (Ag : 83.9(2(2))) 

= 2(2) a 


Next we show that Z is a simulation. Suppose y >u y’. Since Ry = s[Rr], there 
is some x, 2’ E€ Sr such that z >r wv’, s(x) = y, and s(a’) = y’. Since z : T > Z 
is a simulation, we have that z(x) >z z(x), and so Z(s(x)) =z Z(s(x')), and we 
may conclude that Z(y) >z Z(y’). 

Finally, observe that s is surjective, and therefore the solution to the equation 
Zo s = z is unique. 


We conclude this section by illustrating how to compute the function a for 
affine transition systems. Suppose that T is an affine transition system of dimen- 
sion n. We can represent states in Sr by vectors in Q”, and the transition rela- 
tion Rr by a finite set of transitions B C Q” x Q” that generates Rr (i.e., 
Rr = aff(B)). Suppose that A is an m-dimensional subspace of Sž; elements of 
Sž can be represented by n-dimensional row vectors, and A can be represented 
by a basis f/,...,£7,. We can compute a representation of (U,s) = a,(T) as 
follows. The elements of Sy = A* can be represented by m-dimensional vectors 
(with respect to the basis g1,...,gm such that g; is the linear map that sends 
fT to 1 if i = j and to 0 otherwise). The simulation s can be represented by the 
m x n matrix where the ith row is f]. Finally, the transition relation Ry can be 
represented by a set of generators { (s(x), 5(x’)) : (x,x’) € B}. 


3.3 Determinization 


In this section, we show that any transition system operating over a finite- 
dimensional vector space has a best deterministic abstraction, and give an algo- 
rithm for computing the best deterministic affine abstraction (or determiniza- 
tion) of an affine transition system. 

Towards an application of Lemma 2, we seek to characterize the determiniza- 
tion of a transition system by a space of functionals on its state space. For any 
linear space V and space of functionals A on V, define an equivalence relation 
=, on V by « =a y iff f(x) = f(y) for all f € A. If T is a transition system and 
A, A’ are spaces of functionals on Sr, we say that T is (A, A’)-deterministic 
if for all z1, £2 £1, £3 such that zı =, z2, 41 >r qT, and x2 >r xh, then we 
also have x, =, x. Observe that if D is a deterministic transition system and 
d:T — Disa simulation, then T must be (Aq, Aq)-deterministic, where Aq is 
the range of the dual map d*. 

For any T and A, define Det(T,A) = {f :T is (A, {f})-deterministic} to 
be the greatest set of functionals such that T is (A, Det(T, A))-deterministic. 
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Observe that Det(T, —) is a monotone operator on the complete lattice of linear 
subspaces of S% (i.e., if Ay C A2 then Det(T, 41) C Det(T, A2), since A, induces 
a coarser equivalence relation than A2). By the Knaster-Tarski fixpoint theorem 
[28], Det(T,—) has a greatest fixpoint, which we denote by Det(T). Then we 
have that T is (Det(T), Det(T))-deterministic, and Det(T) contains every space 
A such that T is (A, A)-deterministic. 


Lemma 3 (Determinization). For any transition system T, aperr)(T) is a 
deterministic reflection of T. 


Proof. Let (D,d) £ Qpet(7)(T). First, we show that D is deterministic. Suppose 
that y >p yi and y >p yb; we must show that y} = yb. Since Rp is defined 
to be d[Rr], there must be z1, £2, x4, and x, in Sp such that x, >r Ti, 
z2 >T T3, d(x1) = d(x2) = y, d(x) = yj, and d(x) = y2. Since d(x1) = d(x2), 
we have (Af : Det(T). f(x1)) = (Af : Det(T).f(a2)), and therefore #1 =pet(r) £2- 
We thus have x, =pet(T,Det(T)) %2, and since Det(T, Det(T)) = Det(T), we have 
yı = d(x}) = d(xg) = y2- 

It remains to show that (D,d) is a deterministic reflection of T. Suppose 
that (U, u) is another deterministic abstraction of T. Define G to be the range 
of u*. Since U is deterministic, we must have G C Det(T,G), and since Det(T) 
is the greatest fixpoint of Det(T’, —) we have G C Det(T). By Lemma 2, there is 
a unique linear simulation wu: D — U such that tod = u. 


If a transition system T is affine, then its determinization can be computed in 
polynomial time. Fixing a basis for the state space Sr (of some dimension n), we 
can represent the transition relation of T in the form Rp = {(x,x’) : Ax’ = Bx+ 
c} where A, B € Q™*" and c € Q” (for some m). We can represent functionals 
on Sr by n-dimensional vectors, where the vector v € Q” corresponds to the 
functional that maps u +> vTu. A linear space of functionals A can be represented 
by a system of linear equations A = {x : Mx = 0}. The ith row alv = blu + c;, 
of the system of equations Ax’ = Bx +c can be read as “T is ({b]}, {a} })- 
deterministic.” Thus, the functionals fT such that T is (A, {£7})-deterministic 
are those that can be written as a linear combination of the rows of A such that 
the corresponding linear combination of the rows of B belongs to A; i.e., 


Det({(x, x’) : Ax’ = Bx+c}, {f : Mf = 0}) = {d : Sy. MBTy = 0 ^A Aly =d}. 


A representation of Det(T, A) can be computed in polynomial time using Gaus- 
sian elimination. Since the lattice of linear subspaces of S% has height n, the 
greatest fixpoint of Det(T,—) can be computed in polynomial time. 


Example 3. Continuing the example from Fig. 1 and Example 2, we consider the 
determinization of the affine transition system in Eq. (2). The rows of the matrix 
on the left-hand side correspond to generators for Det(aff(F’), Q**): 
Det(aff(F), Q**) = span({[1 0 00] , [0110], [000 1}}) 
Det(aff(F), Det(aff(F), Q*")) = span({[0 1 10] , [000 1]}) 
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which is the greatest fixpoint Det(aff(F’)). Intuitively: after one step of aff(F), 
the values of w, x + y, and z are affine functions of the input; after two steps 
x+y and z are affine functions of the input but w is not, since the value of w on 
the second step depends upon the value of x in the first, and x is not an affine 
function of the input. 

This yields the deterministic reflection (D, d) (pictured in Fig. 1) where 


oo ((ELED-EI-Ga1K e g 


3.4 Rational-Spectrum Reflections of DATS 


In this section, we define rational-spectrum DATS and show that every DATS 
has a rational-spectrum-reflection. 

In the following, it is convenient to work with transition systems that are 
linear rather than affine. We will prove that every deterministic linear transition 
system has a best abstraction with rational spectrum. The result extends to the 
affine case through the use of homogenization: i.e., we embed a (non-empty) affine 
transition system into a linear transition system with one additional dimension, 
such that if we fix that dimension to be 1 then we recover the affine transition 
system. If the transition relation of a DATS is represented in the form Ax’ = 
Bx +c, then its homogenization is simply 


AO} x| — [Be| [x 

Oll}y}] JO} fy 
For a DATS T, we use homog(T) to denote the pair (L, h), consisting the DLTS 
L resulting from homogenization and the affine simulation h : T — L that maps 


each x € Sr to H 


fix the extra dimension y to be 1, we recover the original DATS T). 

Let T be a deterministic linear transition system. Since our goal is to analyze 
the asymptotic behavior of T, and all long-running behaviors of T reside entirely 
within dom“ (T), we are interested in the structure of dom” (T) and T’s behavior 
on this set. First, we observe that dom“ (T) is a linear subspace of Sr and is 
computable. For any k, let T* denote the linear transition system whose transi- 
tion relation is the k-fold composition of the transition relation of R. Consider 
the descending sequence of linear spaces 


(i.e., the affine simulation h formalizes the idea that if we 


dom(T) 2 dom(T?) D dom(T?) D... 


(i.e., the set of states from which there are T computations of length 1, length 
2, length 3, ...). Since the space Sr is finite dimensional, this sequence must 
stabilize at some k. Since the states in dom(T*) have T-computations of any 
length and T is deterministic, we have that dom(T*) is precisely dom” (T). 
Since T is total on dom“(T) and the successor of a state in dom” (T) must 
also belong to dom” (T), T defines a linear map T|,, : dom“ (T) > dom” (T). In 
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this way, we can essentially reduce asymptotic analysis of DATS to asymptotic 
analysis of linear dynamical systems. The asymptotic analysis of linear dynam- 
ical systems developed in Sects. 4 and 5 requires rational eigenvalues; thus we 
are interested in DATS T such that T],, has rational eigenvalues. With this in 
mind, we define spec(T) = spec(T|.,), and say that T has rational spectrum 
if spec(T) C Q. Define Q-DLTS to be the subcategory of DLTS with rational 
spectrum, and Q-DATS to be the subcategory of DATS whose homogenization 
lies in Q-DLTS. 


Example 4. Consider the DLTS T with 


a Te 100] -y 201 

aw SA foro} |",| — jo 2 2 
rei (lol fel) loos ”|=joo 3|” 
nal) Le 000| L 110| 


The bottom-most equation corresponds to a constraint that only vectors where 
the xz and y coordinates are equal have successors, so we have: 


dom(T) = {[zyz]" =a 


Supposing that the x and y coordinates are equal in some pre-state, they are 
equal in the post-state exactly when z = 0, so we have 


dom(T”) = {|zyz]':¢=yAs=0} 


It is easy to check that dom(T?) = dom(T”), and therefore dom” (T) = dom(T?). 
The vector [1 1 0] T is a basis for dom” (T), and the matrix representation of T|w 
with respect to this basis is [2] (i.e, [11 oj" >r [22 0]"). Thus we can see 
spec(T) = {2}, and T is a Q-DLTS. i 


Towards an application of Lemma 2, define the generalized rational 
eigenspace of a DLTS T to be 


Eo(T) = span ({f € S$ : IA € Q, ar € Nt. fo (Tlu — A)" =0}). 


Lemma 4. Let T be a DLTS, and define (Q,q) = arar)(T). Then for any 
Q-DLTS U and any simulation s : T — U, there is a unique simulation 5 : 
Q — U such that Soq=s. 


While ag, (7) (T) satisfies a universal property for Q-DLTS, it does not neces- 
sary belong to Q-DLTS itself because it need not be deterministic. However, by 
iterative interleaving of Lemma 4 and determinization as shown in Algorithm 1, 
we arrive at a Q-DLTS-reflection. Example 5 demonstrates how we calculate a 
Q-DLTS-reflection of a particular DLTS. 
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Example 5. Consider the DLTS T with transition relation 


Say fw 1000] py ti agy 
, 0100] |” 11 00 
RADA oe GO ell 
p a 0001 n 0 0 -10 : 
0000 1-100 


We can calculate the w-domain of T dom” (T) = { [w < y z] T:w= x}, which has 
a basis B = [1100]", [0010]", [0 0 0 1]". With respect to B, T|,, corresponds 


to the matrix 
200 


T|,n = {00 1 

0-10 
and so we have spec(T) = {2,i,—i}. We may calculate Eg(T) by finding (gener- 
alized) left eigenvectors with eigenvalue 2, the only rational number in spec(T): 


100 


100 20 0 200 
Eo(T) = ¢ vt: vt 00 1|— 1020 =0 

a 0-10 002 

001 

= Tle 2I 


= span([1 1 00] , [1 —1 0 0]) 


Finally, we have (Q, q) = ag,(r)(T), where 


Q = 5 1 ; Iı) = = L 
b|’ |b 00 b 01 b 1—100 
Q is deterministic and has rational spectrum, so (Q, q) is a Q-DLTS-reflection 


of T. 


Theorem 1. For any deterministic linear transition system, Algorithm 1 com- 


putes a Q-DLTS-reflection. 


Finally, by homogenization and Theorem 1, we conclude with the desired 
result: 


Corollary 1. Q-DATS is a reflective subcategory of DATS. 
4 Asymptotic Analysis of Linear Dynamical Systems 


This section is concerned with analyzing the behavior of loops of the form 


while (G(x)) do x := Ax, 
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Input : A DLTS T. 

Output : Q-DLTS-reflection of T 

U -T; 

s 4 AUT; /* Invariant: s is a simulation from T to U */ 

while spec(U|u) É Q do 
O.d— anu): /* Lemma 4 */ 
(U, d) — apet(a)(Q) ; /* Lemma 3 */ 
s+ doqos; 


Noa PON RH 


return (U, s) 
Algorithm 1: Computation of a Q-DLTS-reflection of a DLTS 


where the G(x) is an LIA formula and A is a matrix with integer spectrum. Our 
goal is to capture the asymptotic behavior of iterating the map A on an initial 
state xq with respect to the formula G. Specifically, we show that 


Theorem 2. For any LIA formula G and any matriz A with integer spectrum, 
there is a periodic sequence of LIA formulas Ho, Hı, H2,... such that for any 
initial state xo E€ Q”, there exists K such that for any k > K, G(A*xo) holds if 
and only if Hy(xo) does. 


Recall that an infinite sequence Ho, H,, H2,... is periodic if it is of the form 
(Ho, Hi,..., Hp)” £ Ho, Hi, ..., Hp, Ho, Hi,..., Hp,... 


We call the periodic sequence (Ho, H1,..., Hp)” the characteristic sequence of 
the guard formula G with respect to dynamics matrix A, and denote it by 
x(G, A). Note that G(A*xo) holds for all but finitely many k exactly when 
Nio Hi(xo) holds. 

In the remainder of this section, we show how to compute characteristic 
sequences. Let G be an LIA formula and let A be a matrix with integer spectrum. 
To begin, we compute a quantifier-formula G’ that is equivalent to G (using, 
for example, Cooper’s algorithm [7]). We define x(G’, A) by recursion on the 
structure of G”. For the logical connectives A, V, and ~=, characteristic sequences 
are defined pointwise: 


xH, A) = (7(x(4, A)o), ~(x(H, A)ı), se .) 
x(Hı A H2, A) © (x(Ai, A)o A x(H2, A)o, x(H1, A)1 A x( He, A)1,---) 
x(Hı V Ha, A) Ê (x(Hi, A)o V x(H2, A)o, x(H1, A)ı V x(H2, A)1,---) 


It remains to show how y acts on atomic formulas, which take the form of 
inequalities tı < t2 and divisibility constraints n | t. An important fact that we 
employ in both cases is that for any linear term c™x over the variables x, we can 
compute a closed form for cT A*(x) by symbolically exponentiating A. Since (by 
assumption) A has integer eigenvalues, this closed form has the form gx, k)) 
where Q € N and p is an integer exponential-polynomial term, which takes 
the form 

MRGalx + + AE kal x (3) 
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where A; € spec(A), d; € N, and a; € Z”.1 


Characteristic Sequences for Inequalities. Our method for computing 
characteristic sequences for inequalities is a variation of Tiwari’s method for 
deciding termination of linear loops with real eigenvalues [29]. 

First, suppose that p(x, k) is an integer exponential-polynomial of the form 
in Eq. (3) such that each A; is a positive integer. Further suppose that the 
summands are ordered by asymptotic growth, with the dominant term appearing 
earliest in the list; i.e., for 1 < j we have either A; > Aj, or À; = à; and d; > dj. 
If we imagine that the variables x are fixed to some xg € Z”, then we see that 
p(Xo, k) is either identically zero or has finitely many zeros, and therefore its 
sign is eventually stable. Furthermore, the sign of p(xo, k) as k tends to oo is 
simply the sign of its dominant term — that is, the sign of a]xo for the least 
i such that a]xo is non-zero. Thus, we may define a function DTA that maps 
any exponential-polynomial term p(x, k) (with positive integral \;) to an LIA 
formula such that for any Xo € Z”, xo = DTA(p) holds if and only if p(xo, k) is 
eventually non-negative (p(xo,k) > 0 for all but finitely many k € N). DTA is 
defined as follows: 


DTA(0) = true 
DTA(A‘k4aTx + p) ê aľx > 1 V (atx = 0 A DTA(p)) 
Finally, we define the characteristic sequence of an inequality atom as follows. 


An inequality tı < tg over the variables x can be written as c™x + d > 0 for 
c € Z” and d € Z. Let g PevenlX, k) and g qiPoaa(X, k) be the closed forms 
of cTA?*(x) and cTA?*+1(x), respectively; by splitting into “even” and “odd” 
cases, we ensure that the exponential-polynomial terms Peven(X, k) and Poaa(x, k) 
have only positive A; and thus are amenable to the dominant term analysis DTA 
described above. Then we define: 


X (cx + d > 0, A) £ (DTA (Peven(X, k) F AQ even); DTA (Doaa(x, k) ae dQ oaa))” 


Example 6. Consider the matrix A and its exponential A* below: 


z 110 0 0] fz 
y 011 0 0] |y 
Al |z| | = |]001 0 o| Jz 
a 000-30) Ja 
b 000 0 2| |b 
x ree 0 Gl Te 4 (zk? + (2y — z)k + 22) 
y 01 k 0 0 yY zk+ y 
A" | |z| |=]l00 1 0 oļļz|= z 
a 00 0 (-3)F 0| Ja (—3)*a 
b 00 0 0 2k! |b 2*b 


1 Technically, we have 4(A¢k@ al +--- + A%, k¢ al) = cTA*x for all k greater than 
rank of the highest-rank generalized eigenvector of 0, but since we are only interested 
in the asymptotic behavior of A we can disregard the first steps of the computation. 
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First we compute the characteristic sequence x(x > 0, A). Applying the domi- 
nant term analysis of the closed form of x yields 


z>0 
DTA (zk? + (2y — z)k + £) = | V(z=0A2y—z>0) ; 
V(z=0A2y—z=0Aa>0) 


Since the closed form involves only positive exponential terms, we need not split 
into an even and odd case, and we simply have: 


x(a > 0, A) = (2 >0V (Zz =0A 2y—2z>0)V(z2=0A 2y—2=0A2>0))” 


Next we compute the characteristic sequence x(a — b > 0, A), which does 
require a case split. Applying dominant term analysis of the closed form of 
(a — b) yields 


DTA(a- (—3)?* — 6-27") = a > 0 V (a = 0 A —b > 0) 
DTA(a - (—3)2*+1 — b. 2?¥+1) = —a > 0 V (~a = 0A —b > 0). 
and thus we have 
y(a—b>0,A) = (a>0V(a=0A-b>0),-a>0V(-a=0A-b>0))*. 


| 


Characteristic Sequences for Divisibility Atoms. Last we show how to 
define x for divisibility atoms n | t. Write the term t as c™x + d and let the 
closed form of ct A*(x) be 


a 
Q 


The formula n | cTA* (x) +d is equivalent to Qn | AẸktalx + -+A kiral x+ 
Qd. For any i, the sequence (A¥k% mod Qn)? is ultimately periodic, since 
(1) (k mod Qn)% 9 = (0,1,...,Qn — 1)”, (2) (AF mod Qn)%., is ultimately 
periodic (with period and transient length bounded above by Qn)?, and (3) 
ultimately periodic sequences are closed under pointwise product. It follows that 
for each i, there is a periodic sequence of integers (Ca ae that agrees with 
(AFk4: mod Qn)? o on all but finitely many terms. Finally, we take 


(AF kta]x +- + AE kt”al x). 


x(n |t, A) = (Qn | zı ka] x + +++ + Zm, kahX + Qd) Ro - 


Example 7. Consider matrix A and the closed form of its exponents below 


x 110] jz £ 1k 0 x 
Al y| | = |010] ly A: | ly} | = {01 0] ly 
z 005] |z Zz 005* 
? An infinite sequence so, 1, 52,... is ultimately periodic, if there exists N such that 
SN,5N+1,SN+42,--- iS a periodic sequence. We call N the transient length of this 


sequence. 
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We show the characteristic sequences for some divisibility atoms w.r.t A: 


x(3 | z, A) = (3| 2,3 | £ +y,3 | £ + 2y)” 
x(3|e+2,4) E EE E +2)" 
x(3 | z, A) = (3 | z,3 | 22)” 4 


5 A Conditional Termination Analysis for Programs 


This section demonstrates how the results from Sects. 3 and 4 can be combined 
to yield a conditional termination analysis that applies to general programs. 


Integer-Spectrum Restriction for Q-DLTS. Section 3 gives a way to com- 
pute a Q-DATS-reflection of any transition formula. Yet the analysis we devel- 
oped in Sect. 4 only applies to linear dynamical systems with integer spectrum. 
We now show how to bridge the gap. Let V be a Q-DATS. As discussed in 
Sect. 3.4, we may homogenize V to obtain a Q-DLTS T. Define Z(T) to be the 
space spanned by the generalized (right) eigenvectors of T| that correspond to 
integer eigenvalues: 


Z(T) £ span({x € dom” (T) : Ir € Nt, € Z.(T|., — A)" (x) = 0}) 


Since Z(T) is invariant under T],, and thus T, T defines a linear map Tz : 
Z(T) >= Z(T), and by construction T|z has integer spectrum. The following 
lemma justifies the restriction of our attention to the subspace Z(T). 


Lemma 5. Let F be a transition formula, let (V,s) be a Q-DATS-reflection of 
F, and let (T,h) = homog(V). For any state v € dom” (F), we have h(s(v)) € 
Z(T). 


Example 8. The following loop computes the number of trailing 0’s in the binary 
representation of integer x and its corresponding transition formula: 


i n % 2 == 0) do (2 | x) 

R pare 5 F(a,c,u',¢) = | A (a@-1< 22’ A 2a’ <x) 
i A (d =c+1) 

4 c=c+1 


The homogenization of the Q-DATS-reflection of F is the QDLTS T, 


where: 
x x £ 400 £ 
rea) (Yel. fel): |e] = Jory] fe 
h h! K 001| |h 


The w-domain of T is the whole state space Q*. Since the eigenvector [1 0 0] T of 
the transition matrix corresponds to a non-integer eigenvalue Z, the x-coordinate 
of states in Z(T) must be 0; i.e., Z(T) = {(x,c,y) : x = 0}. We conclude that 
x #0 is a sufficient condition for the loop to terminate. 
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Input : A transition formula F'(x, x’) € TF in linear integer arithmetic. 
Output : A mortal precondition mp(F’) for F. 


1 A+ aff(F) ; /* Affine hull [26]; Lemma 1 */ 
2 (D,d) — apetay(A) ; /* Determinize; Lemma 3 */ 
3 (V,q)} — QDATS-reflection of D ; /* Algorithm 1 */ 
4u<qod; /* (V, v) is a Q-DATS-reflection of F */ 
5 (T,h) — homog(V) ; /* Homogenization of V */ 
6 thov; /* t is an affine simulation F => T */ 
7 p< (any) linear projection of Sr onto Z(T); 


00 


C <— matrix such that Cw = 0 4> we Z(T); 

Let G(w) — 3x, x’. F(x, x’) Aw = p(t(x)) A Ct(x) = 0; 

10 (Ho(w),..., Hp(w))” — x(G(w), T|z) ; /* Section 4 */ 
11 return ~ ((A\, Hi(p(t(x)))) A Ct(x) = 0) 

Algorithm 2: Procedure for computing mp(F’). 


o o 


The Mortal Precondition Operator. Algorithm 2 shows how to compute 
a mortal precondition for an LIA transition formula F(x,x’) (i.e., a sufficient 
condition for which F terminates). The algorithm operates as follows. First, we 
compute a Q-DATS-reflection of F, and homogenize to get a Q-DLTS T and 
an affine simulation t : F — T. Let p denote an (arbitrary) projection from Sr 
onto Z(T) (so p is a simulation from T to T|z). We then compute an LIA formula 
G which represents the states w of T|z such that there is some v € dom(F’) such 
that t(v) € Z(T) and p(t(v)) = w. Letting (Ho,...,Hp)” be the characteristic 
sequence x(G,T|z), we have that for any v € dom*(F), t(v) must belong to 
Z(T) and p(t(v)) satisfies each H;, so we define 


mp(F) £ {v € Sp: t(v) ¢ Z(T) or v A TEE). 


Within the context of the algorithm, we suppose that states of F are repre- 
sented by n-dimensional vectors, states of T are represented as m-dimensional 
vectors, and state of T|z are represented as q-dimensional vectors. The affine sim- 
ulation t is represented in the form x ++ Ax + b, where A € Z™*” and b € Z”, 
the projection p as a Z?*™ matrix, and the linear map T|z as a Q?*? matrix. The 
fact that p and t have all integer (rather than rational) entries is without loss of 
generality, since any simulation can be scaled by the least common denominator 
of its entries. 


Theorem 3 (Soundness). For any transition formula F, for any state s such 
that s € mp(F), we have s ¢ dom” (F). 


Proof. Let T, t, p, C, G, and Ho,...,Hp be as in Algorithm 2. We prove the 
contrapositive: we assume v € dom“ (F) and prove v ¢ mp(F), or equivalently 
v | H;(p(t(x))) for each i and t(v) € Z(T). We have t(v) € Z(T) by Lemma 5, 
so it remains only to show that v = A;(p(t(x))) for each i. 

Since v € dom” (F), there exists an infinite trajectory of F starting from 
Ul U >p v >F v2 >r .... For any j, let w; = T/}(p(t(v))). Since pot 
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is an (affine) simulation, we have w; = p(t(v;)) for all j. It follows that for 
any j, we have [v;,vj;41] = F(x,x’) Aw; = p(t(x,;)) A Ct(x;) = 0, and so 
G(w,) = 3x, x’. F(x,x’)Aw,; = p(t(x))ACt(x) = 0 holds for all j. By Theorem 2, 
Hi; (p(t(x))) holds for all Hj. 


The proof of soundness requires only that we can compute Q-DATS- 
abstractions of transition formulas. The following is the culmination of our devel- 
opment of Q-DATS-reflections: 


Theorem 4 (Monotonicity). For any transition formulas F, and F> such that 
Fı H Fb, we have mp(F2) H| mp(Fi). 


The desire for monotonicity is inspired by the principle that changes to a pro- 
gram should have a predictable impact on its analysis [34]. Monotonicity guaran- 
tees that more information into the analysis always leads to better results—for 
example, if a user annotates a procedure with pre-conditions or adds loop invari- 
ants into the program, our termination analysis can only produce weaker (that 
is, better) preconditions for termination. Moreover, in the context of this work, 
monotonicity also guarantees that if we cannot prove termination using the mp 
operator that we defined, then any linear abstraction of the loop has reachable 
non-terminating states. 


6 Evaluation 


Section 5 shows how to compute mortal preconditions for transition formulas. 
Using the framework of algebraic termination analysis [34], we can “lift” the 
analysis to compute mortal preconditions for whole programs. The essential idea 
is to compute summaries for loops and procedures in “bottom-up” fashion, apply 
the mortal precondition operator from Sect. 5 to each loop body summary, and 
then propagate the mortal preconditions for the loops back to the entry of the 
program (see [34] for more details). We can verify that a program terminates by 
using an SMT solver to check that its mortal precondition is valid. 

We have implemented Algorithm 2 as a mortal precondition operator mppp 
(“mortal precondition via Linear Reflections”) in ComPACT, a tool that imple- 
ments the termination analysis framework presented in [34]. We compare the 
performance of our analysis against 2LS [5], Ultimate Automizer [10] and 
CPAchecker [23], the top three competitors in the termination category of 
Competition on Software Verification (SV-COMP) 2020. 

Experiments are run on a virtual machine with Ubuntu 18.04, with a single- 
core Intel Core i7-9750H @ 2.60 GHz CPU and 8GB of RAM. All tools were run 
with a time limit of 10 min. 


Benchmarks. We tested on a suite of 263 programs divided into 4 categories. 
The termination and recursive suites contain small programs with challeng- 
ing termination arguments, while the polybench suite contains larger real-world 
programs that have relatively simple termination arguments. The termination 
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Table 1. Termination verification benchmarks; time in seconds. 


Benchmark |#tasks| mpr 2LS UAutomizer CPAChecker 
#correct time |#correct|time |#correct|time #correct | time 
Termination | 171 98 100.8 /115 1966.0 | 161 4772.2 | 126 12108.6 
Recursive 42 4 51.0 |— -= 30 1781.7| 23 530.6 
Polybench 30 30 128.3) 0 7602.7 0 16241.6| 0 4035.8 
Linear 20 20 37.0) 6 17.6; 8 2841.3} 3 3470.7 
Total 263 152 317.1) 121 9586.3 | 199 25636.8 | 152 20145.7 


Table 2. Comparing Mpg and ComPACT; time in seconds. 


#tasks | mpi ComPACT-mpy_R ComPACT+mpyR 
#correct time | correct | time #correct | time 
Termination | 171 98 100.8 | 141 118.4 |146 114.4 
Recursive 42 4 51.0| 31 95.4 32 94.6 
Polybench 30 30 128.3| 30 179.6 30 179.1 
Linear 20 20 37.0| 15 116.5 20 65.1 
Total 263 152 317.1|217 509.9 |228 453.3 


category consists of the non-recursive, terminating benchmarks from SV-COMP 
2020 in the Termination-MainControlFlow suite. The recursive category con- 
sists of the recursive, terminating benchmarks from the recursive directory and 
Termination-MainControlFlow. Note that 2LS does not handle recursive pro- 
grams, so we exclude it from the recursive category. Finally, we created a new 
test suite linear consisting of programs with terminating linear abstractions. 
This suite is designed to exercise the capabilities of the mpg, and includes 
all examples from Ben-Amram and Genaim’s article [1] on multi-phase ranking 
functions, loops with disjunctive and/or modular arithmetic guards, and loops 
that model integer division and remainder calculation. 


How Does Our Analysis Compare with the State-of-the-Art? The comparison of 
ComPACT using the mpr operator against state-of-the-art termination analy- 
sis tools is shown in Table 1. ComPACT with mp,p is competitive with (but not 
dominating) leading tools in terms of number of tasks solved across the suite, 
and uses substantially less time. The mpg analysis is least successful on the 
termination and recursive suites, which are designed to have difficult termi- 
nation arguments. Most competitive tools use a portfolio of different termina- 
tion techniques to approach such problems (e.g., Ultimate Automizer synthesizes 
linear, nested, multi-phase, lexicographic and piecewise ranking functions); we 
investigate the use of mpg in a portfolio solver in the following. 

ComPACT with mpg solves all tasks in the polybench suite, which con- 
tains numerical programs that have simple termination arguments, but which 
are larger than the SV-COMP tasks. 2LS, Ultimate Automizer, and CPAChecker 


70 S. Zhu and Z. Kincaid 


exhaust time or memory limits on all tasks. Nested loops are a problematic pat- 
tern that appears in these programs, e.g., 


for(int i = 0; i < 4096; i += step) 
for (int j = 0; j < 4096; j += step) 
// no modifications to i, j, or step 


For such loops, Mpg is guaranteed to synthesize a conditional termination 
argument that is at least as weak as step > O (regardless of the contents of the 
inner loop) by monotonicity and the fact that the loop body formula entails 
i < 4096 Ai’ = i + step step’ = step. Ultimate Automizer, CPAChecker, and 
2LS cannot make such theoretical guarantees. 

The linear suite demonstrates that mp, is capable of proving termination 
of programs that lie outside the boundaries of the other tools. 


Can Our Analysis Improve a Portfolio Solver? We compare mpg and Com- 
PACT in Table2. The columns correspond to running ComPACT with the fol- 
lowing options: excluding the portfolio from [34] (mpr), including the port- 
folio but excluding mp,pp (ComPACT-mp,R), and including the portfolio and 
mMpur (ComPACT+mp,p). ComPACT+mp,p can solve 11 additional tasks 
over ComPACT-mp,p while adding negligible runtime overhead. In fact, adding 
Mpg to the portfolio decreases the amount of time it takes for ComPACT to 
complete all benchmark suites. Note that the combined tool is successful on the 
most termination tasks among all the tools we tested, both overall and for each 
individual suite except the termination category. 


7 Related Work 


Termination Analysis of Linear Loops. The universal termination problem for 
linear loops (or total deterministic affine transition systems, in the terminology 
of Sect. 4) was posed by Tiwari [29]. The case of linear loops over the reals was 
resolved by Tiwari [29], over the rationals by Braverman [4], and finally over 
the integers by Hosseini et al. [14]. In principle, we can combine any of these 
techniques with our algorithm for computing DATS-reflections of transition 
formulas to yield a sound (but incomplete) termination analysis. The significance 
of computing a DATS-reflection (rather than just “some” abstraction) is that is 
provides an algorithmic completeness result: if it is possible to prove termination 
of a loop by exhibiting a terminating linear dynamical system that simulates it, 
the algorithm will prove termination. 

The method introduced in Sect.4 to compute characteristic sequences of 
inequalities is based on the method that Tiwari used to prove decidability of 
the universal termination problem for linear loops with (positive) real spectra 
[29]. Tiwari’s condition of having real spectra is strictly more general than the 
integer spectra used by our procedure; requiring that the spectrum be integer 
allows us express the DTA procedure in linear integer arithmetic rather than 
real arithmetic. Similar procedures appear also in [12,18]. We note in particular 
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that our results in Sects.4 and 5 subsume Frohn and Géiesl’s decision proce- 
dure for universal termination for upper-triangular linear loops [12]; since every 
rational upper-triangular linear loop has a rational spectrum (and is therefore a 
QDATS), the mortal precondition computed for any rational upper-triangular 
linear loop is valid iff the loop is universally terminating. 


Linear Abstractions. The formulation of “best abstractions” using reflective sub- 
categories is based on the framework developed in [17]. A variation of this method 
was used in the context of invariant generation, based on computing (weak) 
reflections of linear rational arithmetic formulas in the category of rational vec- 
tor addition systems [27]. This paper is the first to apply the idea to termination 
analysis. 

A method for extracting polynomial recurrence (in)equations that are 
entailed by a transition formula appears in [16]. The algorithm can also be 
applied to compute a TDATS-abstraction of a transition formula. The pro- 
cedure does not guarantee that the TDATS-abstraction is a reflection (best 
abstraction); Proposition 1 demonstrates that no such procedure exists. In this 
paper, we generalize the model to allow non-total transition systems, and show 
that best abstractions do exist. The techniques from Sect.3 can be used for 
invariant generation, improving upon the methods of [16]. 

Kincaid et al. show that the category of linear dynamical systems with peri- 
odic rational spectrum is a reflective subcategory of the category of linear dynam- 
ical systems [18]. A complex number n is periodic rational if n? is rational for 
some p € Z>°. Combining this result with the technique from Sect.3 yields the 
result that the category of DATS with periodic rational spectrum is a reflec- 
tive subcategory of TF. The decision procedure from Sect. 4 extends easily to 
the periodic rational case, which results in a strictly more powerful decision 
procedure. 


Termination Analysis. Termination analysis, and in particular conditional ter- 
mination analysis, has been widely studied. Work on the subject can be divided 
into practical termination analyses that work on real programs (but offer few 
theoretical guarantees) [2,6,8,11,13,20,30-32], and work on simplified model 
(such as linear, octagonal, and polyhedral loops) with strong guarantees (but 
cannot be applied directly to real programs) [1,3,4,14,21,25,29]. This paper 
aims to help bridge the gap between the two, by showing how to apply analyses 
for linear loops to general programs, while preserving some of their desirable 
theoretical properties, in particular monotonicity. 
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Abstract. We present a novel decision tree-based synthesis algorithm 
of ranking functions for verifying program termination. Our algorithm is 
integrated into the workflow of CounterExample Guided Inductive Syn- 
thesis (CEGIS). CEGIS is an iterative learning model where, at each 
iteration, (1) a synthesizer synthesizes a candidate solution from the 
current examples, and (2) a validator accepts the candidate solution if 
it is correct, or rejects it providing counterexamples as part of the next 
examples. Our main novelty is in the design of a synthesizer: building 
on top of a usual decision tree learning algorithm, our algorithm detects 
cycles in a set of example transitions and uses them for refining decision 
trees. We have implemented the proposed method and obtained promis- 
ing experimental results on existing benchmark sets of (non-)termination 
verification problems that require synthesis of piecewise-defined lexico- 
graphic affine ranking functions. 


1 Introduction 


Termination Verification by Ranking Functions and CEGIS. Termination verifi- 
cation is a fundamental but challenging problem in program analysis. Termina- 
tion verification usually involves some well-foundedness arguments. Among them 
are those methods which synthesize ranking functions [16]: a ranking function 
assigns a natural number (or an ordinal, more generally) to each program state, 
in such a way that the assigned values strictly decrease along transition. Exis- 
tence of such a ranking function witnesses termination, where well-foundedness 
of the set of natural numbers (or ordinals) is crucially used. 

We study synthesis of ranking functions by CounterExample Guided Induc- 
tive Synthesis (CEGIS) [29]. CEGIS is an iterative learning model in which a 
synthesizer and a validator interact to find solutions for given constraints. At 
each iteration, (1) a synthesizer tries to find a candidate solution from the cur- 
rent examples, and (2) a validator accepts the candidate solution if it is correct, 
or rejects it providing counterexamples. These counterexamples are then used 
as part of the next examples (Fig. 1). 
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Synthesizer: a candidate solution o | constraints C 
find a candidate m Validator: 
solution o that is eT Is ø a solution for C? 


Consistent witk € No: a set E of examples 


No solution | (Validator adds new examples to £) 


Yes: answer o 


Fig. 1. The CEGIS architecture 


CEGIS has been applied not only to program verification tasks (synthesis of 
inductive invariants [17,18,25,26], that of ranking functions [19], etc.) but also 
to constraint solving (for CHC [12,14, 28,36], for pwCSP(T) [30,31], etc.). The 
success of CEGIS is attributed to the degree of freedom that synthesizers enjoy. 
In CEGIS, synthesizers receive a set of individual examples that synthesizers 
can use in various creative and speculative manners (such as machine learning). 
In contrast, in other methods such as [5-8,24,27], synthesizers receive logical 
constraints that are much more binding. 


Segmented Synthesis in CEGIS-Based Termination Analysis. The choice of a 
candidate space for candidate solutions o is important in CEGIS. A candidate 
space should be expressive: by limiting a candidate space, the CEGIS architec- 
ture may miss a genuine solution. At the same time, complexity should be low: 
a larger candidate space tends to be more expensive for synthesizers to handle. 

This tradeoff is also in the choice of the type of examples: using an expressive 
example type, a small number of examples can prune a large portion of the can- 
didate space; however, finding such expressive examples tends to be expensive. 

In this paper, we use piecewise affine functions as our candidate space for 
ranking functions. Piecewise affine functions are functions of the form 


ùy: t+b TEL 
f@) = ; (1) 


Qn°&tb, LE Ly 


where {L1,..., Ln} is a partition of the domain of f(T) such that each L; is a 
polyhedron (i.e. a conjunction of linear inequalities). We say segmented synthesis 
to emphasize that our synthesis targets are piecewise affine functions with case 
distinction. Piecewise affine functions stand on a good balance between expres- 
siveness and complexity: the tasks of synthesizers and validators can be reduced 
to linear programming (LP); at the same time, case distinction allows them to 
model a variety of situations, especially where there are discontinuities in the 
function values and/or derivatives. 

We use transition examples as our example type (Table 1). Transition exam- 
ples are pairs of program states that represent transitions; they are much cheaper 
to handle compared to trace examples (finite traces of executions until termi- 
nation) used e.g. in [15,33]. The current work is the first to pursue segmented 
synthesis of ranking functions with transition examples; see Table 1. 
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Table 1. Ranking function synthesis by CEGIS 


Candidate space\Example type | Trace examples | Transition examples 
Affine ranking functions | [15,33] [19] 


Piecewise affine ranking functions | [15,33] Our method 


Ei hı 
E e2 
e€4 — 
e3fT hə 
hi 
_ AG) 7 
{ef , et} E sen + os f2(Z) BE) 
{e2} {e3} 


{ex ,€5 } {e3} 


(a) For i iant (b) For ranking functions 
a) For invariants 


Fig. 2. Decision tree learning 


Decision Tree Learning for CEGIS-Based Termination Analysis: a Challenge. In 
this paper, we represent piecewise affine functions (1) by the data structure of 
decision trees. The data structure suits the CEGIS architecture (Fig. 1): iterative 
refinement of candidate solutions can be naturally expressed by growing decision 
trees. The main challenge of this paper is the design of an effective synthesizer 
for decision trees—such a synthesizer learns decision trees from examples. 

In fact, decision tree learning in the CEGIS architecture has already been 
actively pursued, for the synthesis of invariants as opposed to ranking func- 
tions [12,14,18,22,36]. It is therefore a natural idea to adapt the decision tree 
learning algorithms used there, from invariants to ranking functions. However, 
we find that a naive adaptation of those algorithms for invariants does not suffice: 
they are good at handling state examples that appear in CEGIS for invariants; 
but they are not good at handling transition examples. 

More specifically, when decision tree learning is applied to invariant syn- 
thesis (Fig. 2a), examples are given in the form of program states labeled as 
positive or negative. Decision trees are then built by iteratively selecting the 
best halfspaces—where “best” is in terms of some quality measures—until each 
leaf contains examples with the same label. One common quality measure used 
here is an information-theoretic notion of information gain. 

We extend this from invariant synthesis to ranking function synthesis where 
examples are given by transitions instead of states (Fig. 2b). In this case, a 
major challenge is to cope with examples that cross a border of the current 
segmentation—such as the transition e4 crossing the border hı in Fig. 2b. Our 
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decision tree learning algorithm should handle such crossing examples, taking 
into account the constraints imposed on the leaf labels affected by those examples 
(the affected leaf labels are fı(¥) and f3(Z) in the case of e4). 


Our Algorithm: Cycle-Based Decision Tree Learning for Transition Examples. 
We use what we call the cycle detection theorem (Theorem 17) as a theoretical 
tool to handle such crossing examples. The theorem claims the following: if 
there is no piecewise affine ranking function with the current segmentation of 
the domain (such as the one in Fig. 2b given by hı and hz), then this must be 
caused by a certain type of cycle of constraints, which we call an implicit cycle. 

In our decision tree learning algorithm, when we do not find a piecewise affine 
ranking function with the current segmentation, we find an implicit cycle and 
refine the segmentation to break the cycle. Once all the implicit cycles are gone, 
the cycle detection theorem guarantees the existence of a candidate piecewise 
affine ranking function with the segmentation. 

We integrate this decision tree learning algorithm in the CEGIS architecture 
(Fig. 1) and use it as a synthesizer. Our implementation of this framework gives 
promising experimental results on existing benchmark sets. 


Contribution. Our contribution is summarized as follows. 


— We provide a decision tree-based synthesizer for ranking functions integrated 
into the CEGIS architecture. Our synthesizer uses transition examples to find 
candidate piecewise affine ranking functions. A major challenge here, namely 
handling constraints arising from crossing examples, is coped with by our 
theoretical observation of the cycle detection theorem. 

— We implement our synthesizer for ranking functions implemented in MUVAL 
and report the experience of using MUVAL for termination and non- 
termination analysis. The experiment results show that MUVAL’s perfor- 
mance is comparable to state-of-the-art termination analyzers [7,10,13,21] 
from Termination Competition 2020, and that MUVAL can prove (non-)ter- 
mination of some benchmarks with which other analyzers struggle. 


Organization. Section2 shows the overview of our method via examples. 
Section 3 explains our target class of predicate constraint satisfaction problems 
and how to encode (non-)termination problem into such constraints. In Sect. 4, 
we review CEGIS architecture, and then explain simplification of examples into 
positive/negative examples. Section 5 proposes our main contribution, our deci- 
sion tree-based ranking function synthesizer. Section 6 shows our implementation 
and experimental results. Related work is discussed in Sect. 7, and we conclude 
in Sect. 8. 


2 Preview by Examples 


We present a preview of our method using concrete examples. We start with an 
overview of the general CEGIS architecture, after which we proceed to our main 
contribution, namely a decision tree learning algorithm for transition examples. 


Decision Tree Learning in CEGIS-Based Termination Analysis 79 


2.1 Termination Verification by CEGIS 


Our method follows the usual workflow of termination verification by CEGIS. 
It works as follows: given a program, we encode the termination problem into 
a constraint solving problem, and then use the CEGIS architecture to solve the 
constraint solving problem. 


Encoding the Termination Problem. The first step of our method is to encode 
the termination problem as the set C of constraints. 


Example 1. As a running example, consider the following C program. 
while(x != 0) { if (x < 0) { x++; } else { x--; } } 


The termination problem is encoded as the following constraints. 


e<0Ag =xr+1 => Rec) (2) 
ag <0) Az =a-1 => Riaz’). (3) 


Here, R is a predicate variable representing a well-founded relation, and term 
variables x, x’ are universally quantified implicitly. 


The set C of constraints claims that the transition relation for the given 
program is subsumed by a well-founded relation. So, verifying termination is now 
rephrased as the existence of a solution for C. Note that we omitted constraints 
for invariants for simplicity in this example (see Sect. 3 for the full encoding). 


Constraint solving by CEGIS. 
The next step is to solve C by 
CEGIS. 

In the CEGIS architecture, a 
synthesizer and a validator itera- R(z, 2) =r >28 Az >0— y. 
tively exchange a set E of exam- k— E= {R(1,0), R(—2, -1)}} — Gv) 
ples and a candidate solution 
R(x, x’) for C. At the moment, we 
present a rough sketch of CEGIS, 
leaving the details of our imple- 
mentation to Sect. 2.2. 


synthesizer validator 


E 


M R(x, 2") = |a| > |x"| A lel > 0 — (vi) 


Fig. 3. An example of CEGIS iterations 


Example 2. Figure3 shows how the CEGIS architecture solves the set C of 
constraints shown in (2) and (3). Figure 3 consists of three pairs of interactions 
(i)-(vi) between a synthesizer and a validator. 


(i) The synthesizer takes € = Í as a set of examples and returns a candidate 
solution R(x, x’) = L synthesized from €. In general, candidate solutions 
are required to satisfy all constraints in E, but the requirement is vacuously 
true in this case. 
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(ii) 


(vi) 
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The validator receives the candidate solution and finds out that the candi- 
date solution is not a genuine solution. The validator finds that the assign- 
ment x = 1, x’ = 0 is a counterexample for (3), and thus adds R(1,0) to € 
to prevent the same candidate solution in the next iteration. 

The synthesizer receives the updated set € = {R(1,0)} of examples, finds 
a ranking function f(x) = x for E (i.e. for the transition from x = 1 to 
x’ = 0), and returns a candidate solution R(x, x’) = x > x' ^< > 0. 

The validator checks the candidate solution, finds a counterexample z = 
—2, x' = —1 for (2), and adds R(—2, —1) to £. 

The synthesizer finds a ranking function f(x) = |z| for € and returns 
R(x, x’) = |z| > |a’| A |a| > 0 as a candidate solution. Note that the 
synthesizer have to synthesize a piecewise affine function here, but details 
are deferred to Sect. 2.2. 

The validator accepts the candidate solution because it is a genuine solution 


for C. 


2.2 Handling Cycles in Decision Tree Learning 


We explain the importance of handling 
cycles in our decision tree-based synthesizer 
of piecewise affine ranking functions. 

In what follows, we deal with such deci- 
sion trees as shown in Fig. 4: their inter- 
nal nodes have affine inequalities (i.e. half- 
spaces); their leaves have affine functions; 
and overall, such a decision tree expresses 
a piecewise affine function (Fig. 4). When 
we remove leaf labels from such a decision 
tree, then we obtain a template of piecewise 
functions where condition guards are given 
but function bodies are not. We shall call 
the latter a segmentation. 


Input and Output of our Synthesizer. The 
input of our synthesizer is a set E of 
transition examples (e.g. E = {R(1,0), 
R(—2,—1)}) as explained in Sect. 2.1. The 
output of our synthesizer is a well-founded 


x-1 y>0Az-120 
l-z y>0Ax-1<0 
—y y<0 


Fig. 4. An example of a decision 
tree that represents a piecewise affine 
ranking function f(x,y) 


relation R(Z, T’) := f(x) > f(z’) A f(X) > 0 where T is a sequence of variables 
and f(Z) is a piecewise affine function, which is represented by a decision tree 
(Fig. 4). Therefore our synthesizer aims at learning a suitable decision tree. 


Refining Segmentations and Handling Cycles. Roughly speaking, our synthesizer 


learns decision trees in the following steps. 
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i z) i 
> T + > 
eee ee a 3 71 o i 2 
(a) Good (x > 0) (b) Bad (x > —2) 


Fig. 5. Selecting halfspaces. Transition examples are shown by red arrows. Boundaries 
of halfspaces are shown by dashed lines. 


1. Generate a set H of halfspaces from the given set € of examples. This H 
serves as the vocabulary for internal nodes. Set the initial segmentation to be 
the one-node tree (i.e. the trivial segmentation). 

2. Try to synthesize a piecewise affine ranking function f for E with the current 
segmentation—that is, try to find suitable leaf labels. If found, then use this 
f in a candidate well-founded relation R(Z, Z’) = f(T) > f(T) A f(@) > 0. 

3. Otherwise, refine the current segmentation with some halfspace in H, and go 
to Step 2. 


The key step of our synthesizer is Step 3. We show a few examples. 


Example 3. Suppose we are given € = {R(1,0), R(—2,—1)} as a set of exam- 
ples. Our synthesizer proceeds as follows: (1) Our synthesizer generates the set 
H = {z > 1,x > 0,x > —2,x > —1} from the examples in E. (2) Our 
synthesizer tries to find a ranking function of the form f(x) = ax +b (with 
the trivial segmentation), but there is no such ranking function. (3) Our syn- 
thesizer refines the current segmentation with (x > 0) € H because x > 0 
“looks good”. (4) Our synthesizer tries to find a ranking function of the form 
f(x) = if x > 0 then az +b else cx + d, using the current segmentation. Our 
synthesizer obtains f(x) = if x > 0 then x else — x and use this f(x) for a 
candidate solution. 

How can we decide which halfspace in H “looks good”? We use quality mea- 
sure that is a value representing the quality of each halfspace and select the 
halfspace with the maximum quality measure. 

Figure 5 shows the comparison of the quality of x > 0 and x > —2 in this 
example. Intuitively, x > 0 is better than « > —2 because we can obtain a 
simple ranking function if x > 0 then x else — x with x > 0 (Fig. 5a) while we 
need further refinement of the segmentation with x > —2 (Fig. 5b). In Sect. 5, 
we introduce a quality measure for halfspaces following this intuition. 

Our synthesizer iteratively refines segmentations following this quality mea- 
sure, until examples contained in each leaf of the decision tree admit an affine 
ranking function. This approach is inspired by the use of information gain in the 
decision tree learning for invariant synthesis. 


Example 3 showed a natural extension of a decision tree learning method for 
invariant synthesis. However, this is not enough for transition examples, for the 
reasons of explicit and implicit cycles. Here are their examples. 
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Fig. 6. Two examples R(—1,1) and R(1,0) make an implicit cycle between x > 1 and 
a(x > 1). 


Example 4. Suppose we are given E = {R(1,0), R(0,1)}. In this case, there 
is no ranking function because € contains a cycle 1 — 0 — 1 witnessing non- 
termination. We call such a cycle an explicit cycle. 


Example 5. Let E = {R(—1, 1), R(1,0), R(—1, —2), R(2,3)} (Fig. 6). Our syn- 
thesizer proceeds as follows. (1) Our synthesizer generates the set H := {x > 
1,x >0,...} of halfspaces. (2) Our synthesizer tries to find a ranking function 
of the form f(x) = ax + b (with the trivial segmentation), but there is no such. 
(3) Our synthesizer refines the current segmentation with (x > 1) € H because 
x > 1 “looks good” (i.e. is the best with respect to a quality measure). 

We have reached the point where the naive extension of decision tree learn- 
ing explained in Example 3 no longer works: although all constraints con- 
tained in each leaf of the decision tree admit an affine ranking function, there 
is no piecewise affine ranking function for E of the form f(x) = if « > 
1 then az + b else cx + d. 

More specifically, in this example, the leaf representing x > 1 contains 
R(2,3), and the other leaf representing =(x > 1) contains R(—1, —2). The exam- 
ple R(2,3) admits an affine ranking function fı(x) = —a# + 2, and R(—1,—2) 
admits fo(”) = x + 1, respectively. However, the combination f(x) = if x > 
1 then f(x) else fo(x) is not a ranking function for €. Moreover, there is no 
ranking function for € of the form f(x) =if x > 1 then az +b else cr +d. 

It is clear that this failure is caused by the crossing examples R(—1,1) and 
R(1,0). It is not that every crossing example is harmful. However, in this case, 
the set {R(—1,1), R(1,0)} forms a cycle between the leaf for x > 1 and the 
leaf for =(x > 1) (see Fig. 6). This “cycle” among leaves—in contrast to explicit 
cycles such as {R(1,0), R(0,1)} in Example 4—is called an implicit cycle. 

Once an implicit cycle is found, our synthesizer cuts it by refining the current 
segmentation. Our synthesizer continues the above steps (1-3) of decision tree 
learning as follows. (4) Our synthesizer selects (x > 0) € H and cuts the implicit 
cycle {R(—1,1), R(1,0)} by refining segmentations. (5) Using the refined seg- 
mentation, our synthesizer obtains f(x) = if x > 1 then — zx + 2 else if x > 
0 then 0 else x + 3 as a ranking function for E. 


As explained in Example 4, and 5, handling (explicit and implicit) cycles 
is crucial in decision tree learning for transition examples. Moreover, our cycle 
detection theorem (Theorem 17) claims that if there is no explicit or implicit 
cycle, then one can find a ranking function for € without further refinement of 
segmentations. 
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3 (Non-)Termination Verification as Constraint Solving 


We explain how to encode (non-)termination verification to constraint solving. 
Following [31], we formalize our target class pwCSP of predicate constraint 
satisfaction problems parametrized by a first-order theory T. 


Definition 6. Given a formula ¢, let ftv(¢) be the set of free term variables 
and fpu(@) be the set of free predicate variables in ¢. 


Definition 7. A pwCSP is defined as a pair (C, R) where C is a finite set of 
clauses of the form 


£ m 
ov (v x) v ( V x0) (4) 


i=l4+1 


and R C fpv(C) is a set of predicate variables that are required to denote well- 
founded relations. Here, 0 < £ < m. Meta-variables t and ¢ range over T-terms 
and T-formulas, respectively, such that ftu(¢) = Ø. Meta-variables x and X 
range over term and predicate variables, respectively. 


A pwCSP (C, R) is called CHCs (constrained Horn clauses, [9]) if R = Ø and 
£ < 1 for all clauses c € C. The class of CHCs has been widely studied in the 
verification community [12, 14,28, 36]. 


Definition 8. A predicate substitution o is a finite map from predicate variables 
X to closed predicates of the form Ax1,...,2ar(x).¢. We write dom(c) for the 
domain of ø and o(C) for the application of ø to C. 


Definition 9. A predicate substitution ø is a (genuine) solution for (C, R) if (1) 
fpv(C) C dom(a); (2) H| A o(C) holds; and (3) for all X € R, a(X) represents 
a well-founded relation, that is, sort(a(X)) = (5,5) — e for some sequence 3 of 
sorts and there is no infinite sequence ù, V2,... of sequences ù; of values of the 
sorts § such that = p(X)(0;,0;41) for all i > 1. 


Encoding Termination. Given a set of initial state (7) and a transition relation 
T(T, x’), the termination verification problem is expressed by the pwCSP (C, R) 
where R = {R}, and C consists of the following clauses. 


D => É F)AT®) => UR) EAE = REZ) 


We use ọ = > wW as syntax sugar for ~o V w, so this is a pwCSP. The well- 
founded relation R asserts that 7 is terminating. We also consider an invariant 
I for T to avoid synthesizing ranking functions on unreachable program states. 
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Encoding Non-termination. We can also encode a problem of non-termination 
verification to pwCSP via recurrent sets [20]. For simplicity, we explain the 
encoding for the case of only one program variable x. We consider a recurrent 
set R satisfying the following conditions. 


(x) => R(x) (5) 
R(x) => Aa'’.t(ax,2') A R(x") (6) 


To remove 4 from (6), we use the following constraint that is equivalent to (6). 


Riz) = E(z,0) (7) 
Hae) => (lee) R(2')) 
V (S(x, x — 1) A E(a,2' — 1)) V (S(2',2’ +1) A E(a,2'+1)) (8) 


The intuition is as follows. Given x in the recurrent set R, the relation E(x, x’) 
searches for the value of Jz’ in (6). The search starts from z’ = 0 in (7), and 
x’ is nondeterministically incremented or decremented in (8). The well-founded 
relation S asserts that the search finishes within finite steps. As a result, we 
obtain a pwCSP for non-termination defined by (C,R) where R = {S} and C is 
given by (5), (7), and (the disjunctive normal form of) (8). 


Example 10. Consider the following C program. 
while(x > 0) {x= -2 * x + 9; } 


The non-termination problem is encoded as the pwCSP (C, R) where R = {S}, 
and C consists of 


c>0 = R(x) R(x) => E(«,0) 
E(a2,2') => 2’ =-224+9A R(x’) 
V (S(2’, 2’ — 1) A E(a2,2" —1)) V (S(a’, 2" +1) A E(az,2' +1)). 


The program is non-terminating when « = 3. This is witnessed by a solution ø for 
(C, R), which is given by o(R)(x) := x = 3, o(E)(x, x’) = x = 3^0 <a’ Aa’ < 3, 
and o(S)(x', x") := a" = x' +1 Ax” <3. 


4 CounterExample-Guided Inductive Synthesis (CEGIS) 


We explain how CounterExample-Guided Inductive Synthesis [29] (CEGIS for 
short) works for a given pwCSP (C, R) following [31]. Then, we add the extraction 
of positive/negative examples to the CEGIS architecture, which enables our 
decision tree-based synthesizer to use a simplified form of examples. 

CEGIS proceeds through the iterative interaction between a synthesizer and 
a validator (Fig. 1), in which they exchange examples and candidate solutions. 


Definition 11. A formula ¢ is an example of C if ftu(d) =9 and AC E ¢ hold. 
Given a set € of examples of C, a predicate substitution o is a candidate solution 
for (C, R) that is consistent with £ if ø is a solution for (E£, R). 
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Synthesizer. The input for a synthesizer is a set € of examples of C collected from 
previous CEGIS iterations. The synthesizer tries to find a candidate solution 
o consistent with E instead of a genuine solution for (C,R). If the candidate 
solution o is found, then ø is passed to the validator. If € is unsatisfiable, then 
E witnesses unsatisfiability of (C,R). Details of our synthesizer is described in 
Sect. 5. 


Validator. A validator checks whether the candidate solution o from the synthe- 
sizer is a genuine solution of (C,R) by using SMT solvers. That is, satisfiability 
of H —/\o(C) is checked. If | = A o(C) is not satisfiable, then ø is a genuine 
solution of the original pwCSP (C, R), so the validator accepts this. Otherwise, 
the validator adds new examples to the set € of examples. Finally, the synthesizer 
is invoked again with the updated set € of examples. 

If | ~ A o(C) is satisfiable, new examples are constructed as follows. Using 
SMT solvers, the validator obtains an assignment 0 to term variables such that = 
=0(4%) holds for some w € o(C). By (4), =| =0 (4) is a clause of the form = 70(¢)A 
(Na ~o(X:)(0t))) ACN ea o(X;)(0(ti))). To prevent this counterexample 
from being found in the next CEGIS iteration again, the validator adds the 
following example to E. 


lary 


V XO) V V 7X; (0(é)) (9) 


The CEGIS architecture repeats this interaction between the synthesizer and 
the validator until a genuine solution for (C, R) is found or E witnesses unsatis- 
fiability of (C, R). 


Extraction of Positive/Negative Examples. Examples obtained in the above 
explanation are a bit complex to handle in our decision tree-based synthesizer: 
each example in € is a disjunction (9) of literals, which may contain multiple 
predicate variables. 

To simplify the form of examples, we extract from € the sets Er and E% of 
positive examples (i.e., examples of the form X(v)) and negative examples (i.e., 
examples of the form —X()) for each X € fpv(E). This allows us to synthesize a 
predicate o(X) for each predicate variable X € fpv(E€) separately. For simplicity, 
we write Ù € Ef and Ù € Ex instead of X(v) € Ef and nX (©) € Ex. 

The extraction is done as follows. We first substitute for each predicate vari- 
able application X(v) in E a boolean variable bx(z) to obtain a SAT problem 
SAT(E). Then, we use SAT solvers to obtain an assignment 7 that is a solution 
for SAT(E). If a solution 7) exists, then we construct positive/negative examples 
from 7; otherwise, € is unsatisfiable. 


Definition 12. Let 7 be a solution for SAT(E€). For each predicate variable 
X € fpv(E), we define the set EF of positive examples and the set EF of negative 
examples under the assignment n by EX := {ù | n(bx@)) = true} and Ex = {0 | 
n(bx(s)) = false}. 
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Note that some of predicate variable applications X (ù) may not be assigned true 
nor false because they do not affect the evaluation of SAT(E). Such predicate 
variable applications are discarded from {(E¢,Ex)} xefpu(e)- 

Our method uses the extraction of positive and negative examples when the 
validator passes examples to the synthesizer. If X € fpu(E) NR, then we apply 
our ranking function synthesizer to (E$, Ex). If X € fpv(E) \ R, then we apply 
an invariant synthesizer. 

We say a candidate solution ø is consistent with {(EX,Ex)}xeppuey if H 
o(X)(v*) and H 70(X)(0—) hold for each predicate variable X € fpu(E), OT € 
Ex, and Ù- € Ex. If a candidate solution ø is consistent with {(E$, Ex) }xefpu(£)» 
then ø is also consistent with €. 

Note that unsatisfiability of {(£$, Ex )}xe fpo(£) does not immediately implies 
unsatisfiability of E nor (C, R) because {(EX, Ex) } xe fpu(e) depends on the choice 
of the assignment 7. Therefore, the CEGIS architecture need to be modified: if 
synthesizers find unsatisfiability of {(€¢,Ex) }xe fpv(€), then we add the negation 
of an unsatisfiability core to E to prevent using the same assignment 7 again. 

Note that some restricted forms of (9) have also been considered in previous 
work and are called implication examples in [17] and implication/negation con- 
straints in [12]. Our extraction of positive and negative examples is applicable 
to the general form of (9). 


5 Ranking Function Synthesis 


In this section, we describe one of the main contributions, that is, our decision 
tree-based synthesizer, which synthesizes a candidate well-founded relation a(R) 
from a finite set ER of examples. We assume that only positive examples are given 
because well-founded relations occur only positively in pwCSP for termination 
analysis (see Sect. 3). The aim of our synthesizer is to find a piecewise affine 
lexicographic ranking function IE) for the given set £$ of examples. Below, we 
fix a predicate variable R € R and omit the subscript ER = ET, 


5.1 Basic Definitions 


To represent piecewise affine lexicographic ranking functions, we use decision 
trees like the one in Fig. 4. Let T = (z1,...,2,,) be the program variables where 
each x; ranges over Z. 


Definition 13. A decision tree D is defined by D = (T) | if h(x) > 0 then D 
else D where 9(Z%) = (gx(£),.--, 90(%)) is a tuple of affine functions and h(Z) 
is an affine function. A segmentation tree S is defined as a decision tree with 
undefined leaves L: that is, S := L | if h(@) > 0 then S else S. For each 
decision tree D, we can canonically assign a segmentation tree by replacing the 
label of each leaf with L. This is denoted by S(D). For each decision tree D, we 


denote the corresponding piecewise affine function by fp(Z) : Z” = Z**?. 
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Each leaf in a segmentation tree S corresponds to a polyhedron. We often identify 
the segmentation tree S with the set of leaves of S and a leaf with the polyhedron 
corresponding to the leaf. For example, we say something like “for each L € S, 
v € Lis a point in the polyhedron L”. 

Suppose we are given a segmentation tree S and a set E+ of examples. 


Definition 14. For each Lı, L2 € S, we denote the set of example transitions 
from Lı to Lz by Ef, r, = {(%,0’) € EF | U € 1,0 € L2}. An example 
(©, 0) € ET is crossing w.r.t. S if 0,0’) € Ef, z, for some Lı # Lə, and non- 
crossing if (v,v’) € EF r for some L. 


Definition 15. We define the dependency graph G(S,E*) for S and €* by the 
graph (V, E) where vertices V = S are leaves, and edges E = {(L1, L2) | La # 
Lo, I(T, v) € Ef \.LaJ are crossing examples. 


We denote the set of start points ù and end points 0’ of examples (ù, 0’) € E+ 
by EF = {(0| ©, V) € ETF U {V | @, 0") € EF}. 


5.2 Segmentation and (Explicit and Implicit) Cycles: 
One-Dimensional Case 


For simplicity, we first consider the case where f(Z) = f(Z) : Z” > Z is a one- 
dimensional ranking function. Our aim is to find a ranking function f(T) for E+, 
which satisfies V(v,0’) € Et. f(v) > f) and V(0, 0’) € EF. f(v) > 0. If our 
ranking function synthesizer finds such a ranking function f(T), then a candidate 
well-founded relation Ry is constructed as R(T, Zz’) = f(x) > OA f(z) > f(z’). 

Our synthesizer builds a decision tree D to find a ranking function fp(Z) for 
E+. The main question in doing so is “when and how should we refine partitions 
of decision trees?” To answer this question, we consider the case where there 
is no ranking function fp(%) for E+ with a fixed segmentation S, and classify 
reasons for this into three cases as follows. 


Case 1: Explicit Cycles in Examples. We define an explicit cycle in E* as a 
cycle in the graph (Z", €*). An explicit cycle witnesses that there is no ranking 
function for Et (see e.g., Example 4). 


Case 2: Non-crossing Examples are Unsatisfiable. The second case is when there 
is a leaf L € S such that no affine (not piecewise affine) ranking function for the 
set Ef z of non-crossing examples exists. This prohibits the existence of piecewise 
affine function fp(Z) for E+ with segmentation S = S(D) because the restriction 
of fp(T) to L € S must be an affine ranking function for Eri 


Case 3: Implicit Cycles in the Dependency Graph. We define an implicit cycle 
by a cycle in the dependency graph G(S,E€+). Case 3 is the case where an 
implicit cycle prohibits the existence of piecewise affine ranking functions for E* 
with the segmentation S' (e.g., Example 5). If Case 1 and Case 2 do not hold 
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but no piecewise affine ranking function for €* with the segmentation S exists, 
then there must be an implicit cycle by (the contraposition of) the following 
proposition. 


Proposition 16. Assume E* is a set of examples that does not contain explicit 
cycles (i.e. Case 1 does not hold). Let S be a segmentation tree and assume 
that for each L € S, there exists an affine ranking function fr (Z) for Ef, (i.e. 
Case 2 does not hold). If the dependency graph G(S,E*) is acyclic, then there 
exists a decision tree D with the segmentation S(D) = S such that fp(Z) is a 
ranking function for E+. 


Proof. By induction on the height (i.e. the length of a longest path from a 
vertex) of vertices in G(S,E*). We construct a decision tree D as follows. If 
the height of L € S is 0, then we assign fi (T) := fr(%) to the leaf L where 
fx() is a ranking function for EF 5. If the height of L € S is n > 0, then we 
assign f} (TZ) := fL(T) +c to the leaf L where c € Z is a constant that satisfies 
V, 0") € Ef p, FLC) + c > fi (0) for each cell L’ with the height less than n. 


Note that the converse of Proposition 16 does not hold: the existence of implicit 
cycles in G(S,E*) does not necessarily imply that no piecewise affine ranking 
function exists with the segmentation S. 


5.3 Segmentation and (Explicit and Implicit) Cycles: 
Multi-Dimensional Lexicographic Case 


We consider a more general case where f(%) = (fx(Z),.--, fo(%)) is a multi- 
dimensional lexicographic ranking function and k is a fixed nonnegative integer. 

Given a function f(T), we consider the well-founded relation R;(@, x") defined 
inductively as follows. 


RoE) =L Rife, fo) (TT) = fel) 2 0A fe(@) > fT’) (10) 


V fk(T £) = fr T) A Refr- E 


Our aim here is to find a lexicographic ranking function f F(Z) for E*, i.e. a 
function f(Z) such that R 7(¥, 0") holds for each (v,0’) € ET. Our synthesizer 
does so by building a decision tree. The same argument as the one-dimensional 
case holds for lexicographic ranking functions. 


Theorem 17 (cycle detection). Assume €* is a set of examples that does not 
contain explicit cycles. Let S be a segmentation tree and assume that for each L € 
S, there exists an affine function f,(Z) that satisfies V(0, v) € Elp R7 (0, 0’). 
If the dependency graph G(S,E*) is acyclic, then there exists a decision tree D 


with the segmentation S(D) = S such that Rz (v, 0") holds for each (0, v) € EF: 


Proof. The proof is almost the same as Proposition 16. Here, note that if FE = 
f(T) + č where © is a tuple of nonnegative integer constants, then Rz, (Z, T) 


subsumes Rẹ(7, 2’). 
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Algorithm 1 Building decision trees. 


Input: a set €* of examples, an integer k > 0 
Output: a well-founded relation R such that V(%,Z’) € Et, R(T, 7’) 


1: if E has a cycle then 

2: return unsatisfiable 

3: end if 

4: D := RESOLVECASE2(£) 

5: while true do 

6: C := GETCONSTRAINTS(D, E) 

7: O := SUMABSPARAMS(D) 

8: p = MINIMIzE(O, C) 

9: if p is defined then 
10: FE = Fo) @) 
11: return R F 
12: else 
13: get an unsat core in C 
14: find an implicit cycle (1, 0/),...,(%,0;) in the unsat core 
15: find a cell C and two distinct points ù, 0:41 € C in the implicit cycle 
16: add a halfspace to separate 0; and 041 and update D 
17: end if 


18: end while 


5.4 Our Decision Tree Learning Algorithm 


We design a concrete algorithm based on Theorem 17. It is shown in Algorithm 1 
and consists of three phases. We shall describe the three phases one by one. 


Phase 1. Phase 1 (Line 1-3) detects explicit cycles in E+ to exclude Case 1. 
Here, we use a cycle detection algorithm for directed graphs. 


Phase 2. Phase 2 (Line 4) detects and resolves Case 2 by using RESOLVE- 
CASE2 (Algorithm 2), which is a function that grows a decision tree recur- 
sively. RESOLVECASE2 takes non-crossing examples in a leaf, divides the leaf, 
and returns a template tree that is fine enough to avoid Case 2. Here, template 
trees are decision trees whose leaves are labeled by affine templates. 

Algorithm 2 shows the detail of RESOLVECASE2. RESOLVECASE2 builds a 
template tree recursively starting from the trivial segmentation S = L and all 
given examples. In each polyhedron, RESOLVECASE2 checks whether the set C 
of constraints imposed by non-crossing examples can be satisfied by an affine 
lexicographic ranking function on the polyhedron (Line 2-3). If the set C of 
constraints is not satisfiable, then RESOLVECASE2 chooses a halfspace h(x) > 0 
(Line 6) and divides the current polyhedron by the halfspace. 

There is a certain amount of freedom in the choice of halfspaces. To guaran- 
tee termination of the whole algorithm, we require that the chosen halfspace h 
separates at least one point in E/T := {ù | (0, 0’) € E/T} U {0 | (G0) © EF} 
from the other points in €’*. That is: 
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Algorithm 2 Resolving Case 2. 


1: function RESOLVECASE2(E'T) 

2: f = MAKEAFFINETEMPLATE(K) 

3 C := GetConstraints(f, E'+) 

4 p = GETMODEL(C) 

5: if p is undefined then 

6: h := CHOOSEQUALIFIER(E’*) 

T: D>o := RESOLVECASE2({ (Ù, 0’) € E’F | h) > 0AA) > OF) 
8: Deo := RESOLVECASE2({ (V, 0’) € ET | h(@) < 0 ARA) < OF) 
9: return (if h(Z) > 0 then D>o else Deo) 


10: else _ 
11: return f 
12: end if 


13: end function 

14: function GETCONSTRAINTS(D, €*) 

15: return {R7 (v, Y) | (6,0’) € E+} where fp is the tuple of piecewise affine 
functions corresponding to D 

16: end function 


Algorithm 3 A criterion for eager qualifier selection. 


1: function QUALITYMEASURE(h, E’) 


2: Exy = {(0,0') € EF | hÙ) > ON A(T) > 0} 
> E4- = {(0,0') € Et | h) > OAH) < OF 
E_, ={(0,0') € ET | h(®) < 0 Ah) > OF 

( 


3 
4 
5 E- = {(0,0') € €'* | h(w) < OA h) < OF 

6: f == MAKEAFFINETEMPLATE(k) 

7: Cy = GETCONSTRAINTS( f, £44) C- := GeTConstraints(f, E__) 
8: N, := MAxSmT(C,) N_ := MAxSmT(C_) 

9: return Ny +N- + (|B4—|+|B-+|)(1— entropy(|E+-|, |E-+1)) 

10: end function 


Assumption 18. If halfspace h(%) > 0 is chosen in Line 6 of Algorithm 2, then 
there exist v, ù € €’* such that h(v) > 0 and h(t) < 0. 


We explain two strategies (eager and lazy) to choose halfspaces that can be 
used to implement CHOOSEQUALIFIER. Both of them are guaranteed to termi- 
nate, and moreover, intended to yield simple decision trees. 


Eager Strategy. In the eager strategy, we eagerly generate a finite set H of 
halfspaces from the set E+ of all examples beforehand and choose the best one 
from H with respect to a certain quality measure. To satisfy Assumption 18, 
H are generated so that any two points u,v € E+ can be separated by some 
halfspace (A(T) > 0) € H. 

For example, we can use intervals H = {+(x2; —a;) > 0 | i = 1,...,n A 
(a1,...,a4n) E Et} and octagons H = {+(x; — aj) + (xj —a;) >O0|tF 
j A (a1,---,€n) E Et} where © = (£1,..., £n). For any input E't C E+ of 
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RESOLVECASE2, intervals and octagons satisfy Ø # H’ := {h(Z) > 0 | dv,u € 
E't .h(v) > OA h(t) < 0}, so Assumption 18 is satisfied by choosing the best 
halfspace with respect to the quality measure from H”. 

For each halfspace (h(Z) > 0) € H’, we calculate QUALITYMEASURE in 
Algorithm 3, and choose one that maximizes QUALITYMEASURE(h, €’*). QUAL- 
ITYMEASURE(h, €’*) calculates the sum of the maximum number of satisfiable 
constraints in each leaf divided by A(x) > 0 plus an additional term (|E4_| + 
|B_,|)(1 —entropy(|Ey|, |E_|)) where entropy(z, y) = ~% logs <2 — 
log, ={,- Therefore, the term (|Ey—| + |E_+|)(1 — entropy(|E+—|,|E-+|)) is 
close to |E+_|+|£_+| if almost all examples in E4- U E—+ cross h in the same 
direction and close to 0 if |E,_| is almost equal to |E_+|. 


Lazy Strategy. In the lazy strategy, we lazily generate halfspaces. We divide the 
current polyhedron so that non-crossing examples in the cell point to almost the 
same direction. 

First, we label states that occur in Edo as follows. We find a direction that 
most examples in C point to by solving the MAX-SMT a := maxa |{(@, v) € 
Eo | a- (6-0) > 0}|. For each (0, 0’) € Edo: we label two points v, ù with 
+1 if a- (ù — v) > 0 and with —1 otherwise. 

Then we apply weighted C-SVM to generate a hyperplane that separates 
most of the positive and negative points. To guarantee termination of Algo- 
rithm 1, we avoid “useless” hyperplanes that classify all the points by the same 
label. If we obtain such a useless hyperplane, then we undersample a majority 
class and apply C-SVM again. By undersampling suitably, we eventually get 
linearly separable data with at least one positive point and one negative point. 

Note that since coefficients of hyperplanes extracted from C-SVM are floating 
point numbers, we have to approximate them by hyperplanes with rational coef- 
ficients. This is done by truncating continued fraction expansions of coefficients 
by a suitable length. 


Phase 3. In Line 5-18 of Algorithm 1, we further refine the segmentation S(D) 
to resolve Case 3. Once Case 2 is resolved by RESOLVECASE2, Case 2 never 
holds even after refining S(D) further. This enables to separate Phases 2 and 3. 

Given a template tree D, we consider the set C of constraints on parameters 
in D that claims fp(%) is a ranking function for E+ (Line 6). 

If C is satisfiable, we use an SMT solver to obtain a solution of C (i.e. an 
assignment p of integers to parameters) while minimizing the sum of absolute 
values of unknown parameters in D at the same time (Line 8). This minimization 
is intended to give a simple candidate ranking function. The solution p is used 
to instantiate the template tree D (Line 11). 

If C cannot be satisfied, there must be an implicit cycle in the dependency 
graph G(S(D),€*) by Theorem 17. The implicit cycle can be found in an unsat- 
isfiable core of C. We refine the segmentation of D to cut the implicit cycle in 
Line 16. To guarantee termination, we choose a halfspace satisfying the following 
assumption, which is similar to Assumption 18. 
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Assumption 19. If halfspace h(%) > 0 is chosen in Line 16 of Algorithm 1, 
then there exist 0,u € Et such that h(v) > 0 and h(a) < 0. 


We have two strategy (eager and lazy) to refine the segmentation of D. 

In eager strategy, we choose a halfspace (A(X) > 0) € H that separates two 
distinct points v; and v;+, in the implicit cycle. In doing so, we want to reduce 
the number of implicit cycles in G(S(D),E€*), but adding a new halfspace may 
introduce new implicit cycles if there exists (0, 0’) € E$ o that crosses the new 
border from the side of Ù; to the side of 0,41. Therefore, we choose a hyperplane 
that minimizes the number of new crossing examples. 

In lazy strategy, we use an SMT solver to find a hyperplane h(Z) € H that 
separates 0% and U;41 and minimizes the number of new crossing examples. 


Termination. Assumption 18 and Assumption 19 guarantees that every leaf 
in S(D) contains at least one point in the finite set E+. Because the number 
of leaves in S(D) strictly increases after each iteration of Phase 2 and Phase 3, 
we eventually get a segmentation S(D) where each L € S(D) contains only one 
point in E+ in the worst case. Since we have excluded Case 1 at the beginning, 
Theorem 17 guarantees the existence of ranking function with the segmentation 
S(D). Therefore, the algorithm terminates within |E*]| times of refinement. 


Theorem 20. If Assumption 18 and Assumption 19 hold, then Algorithm 1 
terminates. If Algorithm 1 returns a piecewise affine lexicographic function f(Z), 
then the function satisfies Rj(x, x) for each (x,x') € Et where Et is the input 
of the algorithm. 


5.5 Improvement by Degenerating Negative Values 


There is another way to define well-founded relation from the tuple f(z eR 
(fe(@),---, fo()) of functions, that is, the well-founded relation R4 


(F, 
defined inductively by Ro (T, 7’) = L and Pin, _ PET) = = felz) 0 


fr(@) > fe(@’) V (Sel) < OV fe(@) = fe(®)) A Rip, fo) ET) 

In this definition, we loosen the equality f;(%) = f;(Z’) (where i = 1,...,k) 
of the usual lexicographic ordering (10) to fi(Z’) < OV fi(%) = fi(z’). This 
means that once f;(%) becomes negative, f;(%) must stay negative but the value 
do not have to be the same, which is useful for the synthesizer to avoid complex 
candidate lexicographic ranking functions and thus improves the performance. 

However, if we use this well-founded relation RAG, T’) instead of R7, x") 


YS 


- 


n (10), then Theorem 17 fails because Ra, x’) is not necessarily subsumed 
by Ree where © = (cx,...,Co) is a nonnegative constant (see the proof of 
Proposition 16 and Theorem 17). As a result, there is a chance that no implicit 
cycle can be found in line 14 of Algorithm 1. Therefore, when we use RTZ, 2’), 


we modify Algorithm 1 so that if no implicit cycle can be found in line 14, then 
we fall back on the former definition of R F(T, x’) and restart Algorithm 1. 
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6 Implementation and Evaluation 


Implementation. We implemented a constraint solver MUVAL that supports 
invariant synthesis and ranking function synthesis. For invariant synthesis, we 
apply an ordinary decision tree learning (see [12, 14,18, 22,36] for existing tech- 
niques). For ranking function synthesis, we implemented the algorithm in Sect. 5 
with both eager and lazy strategies for halfspace selection. Our synthesizer uses 
well-founded relation explained in Sect.5.5. Given a benchmark, we run our 
solver for both termination and non-termination verification in parallel, and 
when one of the two returns an answer, we stop the other and use the answer. 
MUVAL is written in OCaml and uses Z3 as an SMT solver backend. We used 
clang and Ilvm2kittel [1] to convert C benchmarks to T2 [3] format files, which 
are then translated to pwCSP by MUVAL. 


Experiments. We evaluated our implementation MUVAL on C benchmarks from 
Termination Competition 2020 (C Integer) [4]. We compared our tool with 
APROVE [10,13], (RANKFINDER [7], and ULTIMATE AUTOMIZER [21]. Experi- 
ments are conducted on StarExec [2] (CentOS 7.7 (1908) on Intel(R) Xeon(R) 
CPU E5-2609 0 @ 2.40GHz (2393 MHZ) with 263932744 kB main memory). The 
time limit was 300s. 


Results. Results are shown in Table2. Table 2. Numbers of solved benchmarks 


Yes/No/TO/U means the number of EA 
benchmarks that these tools could MOVA (eager) 204 lsa la o 
verify termination/could verify non- MuVaL (lazy) 200/84 51 | 0 
termination/could not answer within APROVE 216 | 100/16 | 3 
300s and timed out (TimeOut)/gave THANE INDER sc act 0 34 
ULTIMATE AUTOMIZER | 180 | 83 2 | 70 
up before 300s (Unknown), respec- À 
z ®We removed one benchmarks from the 
tively. We also show scatter plots of result of IRANKFINDER because the answer 
runtime in Fig. 7. Torone: 


MUVAL was able to solve more benchmarks than ULTIMATE AUTOMIZER. 
Compared to IRANKFINDER, MUVAL solved slightly fewer benchmarks, but was 
faster in a large number of benchmarks: 265 benchmarks were solved faster by 
MUVAL, 68 by IRANKFINDER, and 2 were not solved by both tools within 300s 
(here, we regard U (unknown) as 300s). Compared to APROVE, MUVAL solved 
fewer benchmarks. However, there are several benchmarks that MUVAL could 
solve but APROVE could not. Among them is “TelAviv-Amir-Minimum_true- 
termination.c”, which does require piecewise affine ranking functions. MUVAL 
found a ranking function f(x,y) = if «— y > 0 then y else x, while APROVE 
timed out. 

We also observed that using CEGIS with transition examples itself showed 
its strengths even for benchmarks that do not require piecewise affine ranking 
functions. Notably, there are three benchmarks that MUVAL could solve but the 
other tools could not; they are examples that do not require segmentations. Fur- 
ther analysis of these benchmarks indicates the following strengths of our frame- 
work: (1) the ability to handle nonlinear constraints (to some extent) thanks to 
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irankfinder v1.3.2 Ultimate Automizer 


10 
AProve 


Fig. 7. Scatter plots of runtime. ULTIMATE AUTOMIZER and APROVE sometimes gave 
up before the time limit, and such cases are regarded as 300s. 


the example-based synthesis and the recent development of SMT solvers; and 
(2) the ability to find a long lasso-shaped non-terminating trace assembled from 
multiple transition examples. See [23, Appendix A] for details. 


7 Related Work 


There are a bunch of works that synthesize ranking functions via constraint solv- 
ing. Among them is a counterexample-guided method like CEGIS [29]. CEGIS 
is sound but not guaranteed to be complete in general: even if a given constraint 
has a solution, CEGIS may fail to find the solution. A complete method for 
ranking function synthesis is proposed in [19]. They collect only extremal coun- 
terexamples instead of arbitrary transition examples to avoid infinitely many 
examples. A limitation of their method is that the search space is limited to 
(lexicographic) affine ranking functions. 

Another counterexample-guided method is proposed in [33] and implemented 
in SEAHORN. This method can synthesize piecewise affine functions, but their 
approach is quite different from ours. Given a program, they construct a safety 
property that the number of loop iterations does not exceed the value of a 
candidate ranking function. The safety property is checked by a verifier. If it 
is violated, then a trace is obtained as a counterexample and the candidate 
ranking function is updated by the counterexample. The main difference from 
our method is that their method uses trace examples while our method uses 
transition examples (which is less expensive to handle). FREQTERM [15] also 
uses the connection to safety property, but they exploit syntax-guided synthesis 
for synthesizing ranking functions. 

Aside from counterexample-guided methods, constraint solving is widely 
studied for affine ranking functions [27], lexicographic affine ranking func- 
tions [5,7,24], and multiphase affine ranking functions [6,8]. Their implemen- 
tation includes RANKFINDER and IRANKFINDER. Farkas’ lemma or Motzkin’s 
transposition theorem are often used as a tool to transform JV-constraints to J- 
constraints. However, when we apply this technique to piecewise affine ranking 
functions, we get nonlinear constraints [24]. 
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Abstract interpretation is also applied to segmented synthesis of ranking 
functions and implemented in FUNCTION [32, 34,35]. In this series of work, deci- 
sion tree representation of ranking functions is used in [35] for better handling 
of disjunctions. Compared to their work, we believe that our method is more 
easily extensible to other theories than linear integer arithmetic as long as the 
theories are supported by SMT solvers (although such extensions are out of the 
scope of this paper). 

Other state-of-the-art termination verifiers include the following. ULTIMATE 
AUTOMIZER [21] is an automata-based method. It repeatedly finds a trace and 
computes a termination argument that contains the trace until termination argu- 
ments cover the set of all traces. Büchi automata are used to handle such traces. 
APROVE [10,13] is based on term rewriting systems. 


8 Conclusions and Future Work 


In this paper, we proposed a novel decision tree-based synthesizer for ranking 
functions, which is integrated into the CEGIS architecture. The key observa- 
tion here was that we need to cope with explicit and implicit cycles contained in 
given examples. We designed a decision tree learning algorithm using the theoret- 
ical observation of the cycle detection theorem. We implemented the framework 
and observed that its performance is comparable to state-of-the-art termination 
analyzers. In particular, it solved three benchmarks that no other tool solved, 
a result that demonstrates the potential of the current combination of CEGIS, 
segmented synthesis, and transition examples. 

We plan to extend our ranking function synthesizer to a synthesizer of piece- 
wise affine ranking supermartingales. Ranking supermartingales [11] are prob- 
abilistic version of ranking functions and used for verification of almost-sure 
termination of probabilistic programs. 

We also plan to implement a mechanism to automatically select a suitable 
set of halfspaces with which decision trees are built. In our ranking function 
synthesizer, intervals/octagons/octahedron/polyhedra can be used as the set of 
halfspaces. However, selecting an overly expressive set of halfspaces may cause 
the problem of overfitting [25] and result in poor performance. Therefore, apply- 
ing heuristics that adjusts the expressiveness of halfspaces based on the current 
examples may improve the performance of our tool. 


Acknowledgement. We thank Andrea Peruffo and the anonymous referees for many 
suggestions. This work was supported by JST ERATO HASUO Metamathematics for 
Systems Design Project (No. JPMJER1603) and JSPS KAKENHI Grant Numbers 
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Abstract. Being able to argue about the performance of self-adjusting 
data structures such as splay trees has been a main objective, when 
Sleator and Tarjan introduced the notion of amortised complexity. 

Analysing these data structures requires sophisticated potential func- 
tions, which typically contain logarithmic expressions. Possibly for these 
reasons, and despite the recent progress in automated resource analy- 
sis, they have so far eluded automation. In this paper, we report on 
the first fully-automated amortised complexity analysis of self-adjusting 
data structures. Following earlier work, our analysis is based on potential 
function templates with unknown coefficients. 

We make the following contributions: 1) We encode the search for 
concrete potential function coefficients as an optimisation problem over a 
suitable constraint system. Our target function steers the search towards 
coefficients that minimise the inferred amortised complexity. 2) Automa- 
tion is achieved by using a linear constraint system in conjunction with 
suitable lemmata schemes that encapsulate the required non-linear facts 
about the logarithm. We discuss our choices that achieve a scalable anal- 
ysis. 3) We present our tool ATLAS and report on experimental results 
for splay trees, splay heaps and pairing heaps. We completely automati- 
cally infer complexity estimates that match previous results (obtained by 
sophisticated pen-and-paper proofs), and in some cases even infer better 
complexity estimates than previously published. 


Keywords: Amortised cost analysis - Functional programming - 
Self-adjusting data structures - Automation - Constraint solving 


1 Introduction 


Amortised analysis, as introduced by Sleator and Tarjan [47,49], is a method 
for the worst-case cost analysis of data structures. The innovation of amortised 
analysis lies in considering the cost of a single data structure operation as part of 
a sequence of data structure operations. The methodology of amortised analysis 
allows one to assign a low (e.g., constant or logarithmic) amortised cost to a 
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data structure operation even though the worst-case cost of a single operation 
might be high (e.g., linear, polynomial or worse). The setup of amortised analysis 
guarantees that for a sequence of data structure operations the worst-case cost 
is indeed the number of data structure operations times the amortised cost. 
In this way amortised cost analysis provides a methodology for worst-case cost 
analysis. Notably, the cost analysis of self-adjusting data structures, such as 
splay trees, has been a main objective already in the initial proposal of amortised 
analysis [47,49]. Analysing these data structures requires sophisticated potential 
functions, which typically contain logarithmic expressions. Possibly for these 
reasons, and despite the recent progress in automated complexity analysis, they 
have so far eluded automation. 

In this paper, we present the first fully-automated amortised cost analysis 
of self-adjusting data structures, that is, of splay trees, splay heaps and pairing 
heaps, which so far have only (semi-) manually been analysed in the literature. 
We implement and extend a recently proposed type-and-effect system for amor- 
tised resource analysis [26,27]. This system belongs to a line of work (see [20,22— 
25,28] and the references therein), where types are template potential functions 
with unknown coefficients and the type-and-effect system extracts constraints 
over these coefficients in a syntax directed way from the program under analy- 
sis. Our work improves over [26,27] in three regards: 1) The approach of [26, 27] 
only supports type checking, i.e. verifying that a manually provided type is cor- 
rect. In this paper, we add an optimisation layer to the set-up of [26,27] in order 
to support type inference, i.e. our approach does not rely on manual annota- 
tions. Our target function steers the search towards coefficients that minimise 
the inferred amortised complexity. 2) The only case study of [26,27] is partial, 
focusing on the zig-zig case of the splay tree function splay, while we report 
on the full analysis of the operations of several data structures. 3) [26,27] does 
not report on a fully-automated analysis. Besides the requirement that the user 
needs to provide the resource annotation, the user also has to apply the struc- 
tural rules of the type system manually. Our tool ATLAS is able to analyse our 
benchmarks fully automatically. Achieving full automation required substantial 
implementation effort as the structural rules need to be applied carefully—as 
we learned during our experiments—in order to avoid a size explosion of the 
generated constraint system. We evaluate and discuss our design choices that 
lead to a scalable implementation. 

With our implementation and the obtained experimental results we make 
two contributions to the complexity analysis of data structures: 


1.) We automatically infer complexity estimates that match previous results 
(obtained by sophisticated pen-and-paper proofs), and in some cases even infer bet- 
ter complexity estimates than previously published. In Table 1, we state the com- 
plexity bounds computed by ATLAS next to results from the literature. We match 
or improve the results from [37,41,42]. To the best of our knowledge, the bounds for 
splay trees and splay heaps represent the state-of-the-art. In particular, we improve 
the bound for the delete function of splay trees and all bounds for the splay heap 
functions. For pairing heaps, Iacono [29,30] has proven (using a more involved 
potential function) that insert and merge have constant amortised complexity, 
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Table 1. Amortised complexity bounds for splay trees (module name SplayTree, 
abbrev. ST), splay heaps (SplayHeap, SH) and pairing heaps (PairingHeap, PH). 


Function name | ATLAS (automated) [42] (manual)* [37] (semi-automated) 

ST. splay 3/2 logs (|t|) 3/2 logs (|t|) + 1 3/2 loga(|t|) + 1 

ST.splaymax | 3/2log,(|¢|) L, 3/2logə (|t|) + 1 

ST.insert 2 logs (|t|) + 3/2 2 logs(|t] + 1) + O(1) | 2 logs (|t|) + 3/2 

ST.delete 5/2 logs (|t|) +3 3 logs(|t] + 1) + O(1) | 3logs(|t|) + 2 

SH.partition | 3/4log,(|¢|)- z 2logs(\t] +1)+1 
oga(ltl + 1) 

SH. insert 3/4 logs (|t|)4 z 3log (|t| +2) +1 
ogz(ltl + 1) + 3/2 

SH.del_min ogə(|t|) - 2logs(|t]} +1) +1 

PH.merge_pairs | 3/2 logs ({h|) 7 3 logs (\hl) +4 

PH. insert /2 logs ({h]) - logə(|h| +1)+1 

PH.merge jzloga(lhi] + |hal) +1 | 1/21og2(lh1| + hal) |1oga(lh1] + [hal +1) +2 

PH.del_min 085 (\h]) logs ({Al) 3logs({h| +1) +4 


8142] uses a different cost metric, i.e. the numbers of arithmetic comparisons, whereas we and 
[37] count the number of (recursive) function applications. We adapted the results of [42] to 
our cost metric to make the results easier to compare, i.e. the coefficients of the logarithmic 
terms are by a factor 2 smaller compared to [42]. 


while the other data structure operations continue to have an amortised complex- 
ity of k loga (|t|); while we leave an automated analysis based on Iacono’s potential 
function for future work, we note that his coefficients k in the logarithmic terms 
are large, and that therefore the small coefficients in Table 1 are still of interest. 
We will detail below that we used a simpler potential function than [37,41,42] to 
obtain our results. Hence, also the new proofs of the confirmed complexity bounds 
can be considered a contribution. 


2.) We establish a new approach for the complexity analysis of data structures. 
Establishing the prior results in Table 1 required considerable effort. Schoenmak- 
ers studied in his PhD thesis [42] the best amortised complexity bounds that can 
be obtained using a parameterised potential function ¢(t), where t is a binary 
tree, defined by ¢(leaf) :=0 and (U, d, r)) := O(I)+ Blog, (ll +|r]) + (r), 
for real-valued parameters a, > 0. Carrying out a sophisticated optimisa- 
tion with pen and paper, he concluded that the best bounds are obtained 
by setting a = W4 and 8 = f for splay trees, and by setting a = v2 and 
p= j for pairing heaps (splay heaps were proposed only some years later 
by Okasaki in [38]). Brinkop and Nipkow verify his complexity results for 
splay trees in the theorem prover Isabelle [37]. They note that manipulating 
the expressions corresponding to @log,,(|t|) could only partly be automated!. 


1 Nipkow et al. [37] state “The proofs in this subsection require highly nonlinear arith- 
metic. Only some of the polynomial inequalities can be automated with Harrison’s 
sum-of-squares method [16]”. 
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For splay heaps, there is to the best of our knowledge no previous attempt to 
optimise the obtained complexity bounds, which might explain why our optimis- 
ing analysis was able to improve all bounds. For pairing heaps, Brinkop and Nip- 
kow did not use the optimal parameters reported by Schoenmakers—probably 
in order to avoid reasoning about polynomial inequalities—, which explains the 
worse complexity bounds. In contrast to the discussed approaches, we were able 
to verify and improve the previous results fully automatically. Our approach uses 
a variation of Schoenmakers’ potential function, where we roughly fix a = 2 and 
leave @ as a parameter for the optimisation phase (see Sect. 2 for more details). 
Despite this choice, our approach was able to derive bounds that match and 
improve the previous results, which came as a surprise to us. Looking back at 
our experiments and interpreting the obtained results, we recognise that we 
might have been in luck with the particular choice of the potential function 
(because we can obtain the previous results despite fixing a = 2). However, 
we would not have expected that an automated analysis is able to match and 
improve all previously reported coefficients, which shows the power of the opti- 
misation phase. Thus, we believe that our results suggest a new approach for 
the complexity analysis of data structures. So far, self-adjusting data structures 
had to be analysed manually. This is possibly due to the use of sophisticated 
potential functions, which may contain logarithmic expressions. Both features 
are challenging for automated reasoning. Our results suggest that the following 
alternative (see Sects. 2 and 4.2 for more details): (i) Fix a parameterised poten- 
tial function; (ii) derive a (linear) constraint system over the function parameters 
from the AST of the program; (iii) capture the required non-linear reasoning in 
lemmata, and use Farkas’ lemma to integrate the application of these lemmata 
into the constraint system (in our case two lemmata, one about an arithmetic 
property and one about the monotonicity of the logarithm, were sufficient for 
all of our benchmarks); and finally (iv) find values for the parameters by an 
(optimising) constraint solver. We believe that our approach will carry over to 
other data structures: one needs to adapt the potential functions and add suit- 
able lemmata, but the overall setup will be the same. We compare the proposed 
methodology to program synthesis by sketching [48], where the synthesis engi- 
neer communicates her main insights to the synthesis engine (in our case the 
potential functions plus suitable lemmata), and a constraint solver then fills in 
the details. As conclusion from our benchmarking, we observe that an auto- 
mated analysis of sophisticated data structures are possible without the need to 
(i) resort to user guidance; (ii) forfeit optimal results; or (iii) be bogged down in 
computation times. These results also show how dependencies on properties of 
functional correctness of the code can be circumvented. 


Related Work. To the best of our knowledge the here presented automated 
amortised analysis of self-adjusting data-structures is novel and unparalleled in 
the literature. However, there is a vast amount of literature on (automated) 
resource analysis. Without hope for a completeness, we briefly mention [1-7,9- 
11,14, 15,17,18, 20, 22-25,39,44—46, 52] for an overview of the field. Logarithmic 
and sublinear bounds are typically not in the focus of the cited approaches, but 
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can be inferred by some tools. In the recurrence relations based approach to cost 
analysis [1] refinements of linear ranking functions are combined with criteria 
for divide-and-conquer patterns; this allows the tool PUBS to recognise logarith- 
mic bounds for some problems, but examples such as mergesort or splaying are 
beyond the scope of this approach. Logarithmic and exponential terms are inte- 
grated into the synthesis of ranking functions in [8], making use of an insightful 
adaption of Farkas’ and Handelman’s lemmas. The approach is able to handle 
examples such as mergesort, but again not suitable to handle self-balancing data 
structures. A type based approach to cost analysis for an ML-like language is 
presented in [50], which uses the Master Theorem to handle divide-and-conquer- 
like recurrences. Recently, support for the Master Theorem was also integrated 
for the analysis of rewriting systems [51], extending [4] on the modular resource 
analysis of rewriting to so-called logically constrained rewriting systems [12]. 
The resulting approach also supports the fully automated analysis of mergesort. 


Structure. In Sects.2 and 3 we review the type system of [26,27]. We sketch 
the challenges to automation in Sect.4 and present our contributions in Sects. 5 
and 6. Finally, we conclude in Sect. 7. 


2 Step by Step to an Automated Analysis of Splaying 


In this and the next section we sketch the theory developed by Hofmann et al. 
in [27], in order to be able to present the contributions of this article in Sect. 4 
and 5. For brevity, we restrict our exposition to those parts essential in the 
analysis of a particular program code. As motivating example consider splay 
trees, introduced by Sleator and Tarjan [47,49]. Splaying is the most important 
operation on splay trees, which performs rotation. Consider Fig. 1, a depiction 
of the zig-zig case of splay, which implements splaying. 

The analysis of [27] (see also [26]) is formulated in terms of the physicist’s 
method of amortised analysis in the style of Sleator and Tarjan [47,49]. The 
central idea of this approach is to assign a potential to the data structures 
of interest such that the difference in potential before and after executing a 
function is sufficient to pay for the actual cost of the function, i.e. one chooses 
potential functions ¢,w such that ¢(v) > cr(v) + y(f(v)) holds for all inputs v 
to a function f, where cf(v) denotes the worst-case cost of executing function 
f on v. This generalises the original formulation, which can be seen by setting 
o(v) := af(v) + ¥(v), where af(v) denotes the amortised cost of f. 

In order to be able to analyse self-adjusting data structures such as splay 
trees, one needs potential functions that can express logarithmic amortised cost. 
Hofmann et al. [26,27] propose to make use of a variant of Schoenmakers’ poten- 
tial, rk(¢) for a tree t, cf. [37,41,42], defined inductively by 


rk(leaf) := 1 rk((l, d, r)) := rk(1) + loga (|l) + logs(|r|) + rk(r) , 


where l, r are the left resp. right child of the tree (l, d, r), |t| denotes 
the size of a tree (defined as the number of leaves of the tree), and d is some 
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splay a t = match t with 
| Cel, c, cr) -> match cl. with 


| (bl, b, br) -> let s = splay a bl in match s with 
|, Cally, at ax) => all, sa?) Car, by. bry oy. er) 


Rone 


Fig. 1. Zig-zig case of the splay function. 


data element that is ignored by the potential function. Besides Schoenmakers’ 
potential, further basic potential functions need to be added to the analysis: For 
a sequence of m trees t),...,¢m and coefficients a;,b € N, the potential function 


P(A 440m b) (E15 ++ +5 tm) = logy (ar + [ti] +--+ + am + [tm] + b) 


denotes the logarithm of a linear combination of the sizes of the tree. 

Following [37], we set the cost Cspiay(t) of splaying a tree t to be the number of 
recursive calls to splay. Splaying and all operations that depend on splaying can 
be done in O(log, n) amortised cost. Employing the above introduced potential 
functions, the analysis of [27] is able verify the following cost annotation for 
splaying (the annotation needs to be provided by the user): 


rk(t) + 3 - pa,o) (t) + 1 > Cepiay(t) + rk(splay a t). (1) 


From this result, one directly reads off 3 - pao) (t) + 1 = 3 - loga(|t|) + 1 as 
bound on the amortised cost of splaying.” 

Based on earlier work [6,20,22—25,27,28] employs a type-and-effect system 
that uses template potential functions, i.e. functions of a fixed shape with inde- 
terminate coefficients. The key challenge is to identify templates that are suitable 
for logarithmic analysis and that are closed under the basic operations of the 
considered programming language. For example, one introduces the coefficients 
x 11,0)» 1(0,2)> k> d,o) 0,2) and introduces the potential function templates 


D(t: TIQ) := qx < rk(t) + q(1,0) © Pao) (t) + 40,2) * P(o,2) (t) 
(splay a t:T|Q') := q,- rk(splay a t)+ 
+ 1,0) Pao) (splay a t) + q(o,2) * P(o,2) (splay a t) , 


for the input and output of the splay function. The type system then derives 
constraints on the template function coefficients, as indicated in the sequel. We 
take up further discussion of the constraint system, in particular how to maintain 
a scalable analysis, in Sect. 4. 

We explain the use of the type system on the motivating example. For brevity, 
type judgements and the type rules are presented in a simplified form. In par- 
ticular, we restrict our attention to tree types, denoted as T. This omission is 
inessential to the actual complexity analysis. For the full set of rules see [27]. 


? For ease of presentation, we elide the underlying semantics for now and simply write 
“splay a t” for the resulting tree t’, obtained after evaluating splay a t. 
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splay: T|Q —> TIQ ( ) 
a cf : 
bl: T|Q F splay a »1: T/Q’ — 1 kE A|RE splay a bi: T|R’ 
cr:T,br:T,s:T|Q4 F match g with |(al,a’,ar) -> t: TIQ 


let: T 
er: T, bl: T, br: T|Q3 F e: TIQ’ isi ( ) 
WwW 
er: T, bl: T, br: T|Q2 F e1: TIQ EFS 
dT cf TIO) F maten cl vitn I (bl, b, br) -> e: TQ 0S 
(match) 


t:T|Q F match t withI(el,c,cr) -> e1:T|Q’ 


Fig. 2. Partial typing derivation for the motivating example splay. 


Let e denote the body of the function definition of splay a t , depicted in 
Fig. 1. Our automated analysis infers an annotated type of splaying, by verifying 
that the type judgement 

tETIQF eT, (2) 


is derivable. As above, types are decorated with annotations Q := 
[ds 401,0); 90,2)] and Q! := [a> da0)» (o, employed to express the potential 
carried by the arguments to splay and its results. 

The soundness theorem of the type system (Theorem 1) expresses that if 
the above type judgement is derivable, then the total cost Cspiay(t) of splay- 
ing is bound by the difference between (t: T|Q) and (splay a t:T|Q’), i.e. 


P(t: T|Q) > Cspray(t) + (splay a t: T|Q'), In particular, Eq. 1 can be derived 
in this way. 

We now provide an intuition on the type-and-effect system, stepping through 
the code of Fig. 1. The corresponding type derivation tree is depicted in Fig. 2. 
We note that the tree contains further annotations Qi, Q2, Q3, Q4 (besides the 
annotations Q and Q’) which again represent the unknown coefficients of poten- 
tial function templates. The goal of the type-and-effect system is to provide 
constraints for each programming construct that connect the annotations in sub- 
sequent derivation steps, e.g. Q2 and Q3. The type-and-effect system operates 
syntaz-directed and formulates one rule per programming languages construct. 
We now discuss some of these rules for the partial derivation for splay. 

The outermost command of e is a match statement, for which the following 
rule is applied: 


cl: T, er: T|Q1 F e1: T|Q’ 
t:T|Q F match ¢ with | (cl,c,cr) -> e1: TIQ 


(match) 


Here e; denotes the subexpression of e, which constitutes the nested pattern 
match. Primarily, this is a standard type rule for pattern matching. The novelty 
are the constraints on the annotations Q, Q’ and Qı. More precisely, (match) 
induces the constraints 
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1 1 1 1 1 1 
qi 542 54 4a,1,0) = 4,0) 4,0,0) = 4(0,1,0) = 4  4(0,0,2) = 4(0,2) » 


which can be directly read-off the definition of rk(t) = rk(cl) + loga(|cl|) + 
logs (|cr|)+rk(cr). Similarly, the nested match command, starting expression e}, 
is subject to the same rule; the resulting constraints amount to 


2 2 2 2 1 2 1 
qi = 42 = % 9(0,0,0,2) = 4(0,0,2) 9(1,1,1,0) = 4(1,1,0) 
2 1 2 1 2 2 1 
4(0,1,1,0) = 4(1,0,0) 9(1,0,0,0) = 4(0,1,0) 4(0,1,0,0) = 4(0,0,1,0) = 41 - 


Besides the rules for programming language constructs, the type-and-effect 
system contains structural rules, which operate on the type annotations them- 
selves. The weakening rule allows a suitable adaptation of the coefficients of the 
potential function ®(T'|Q2) to obtain a new potential function #(T'|Q3), where 
we use the shorthand I := cr: T, bl: T, br: T: 


TIQ: F ef: T|Q’ B(I'|Q2) > G(I|Q3) 
TIQ F ef: TQ’ 


(w) 


The difficulty in applying the weakening rule, consists in discharging the 
constraint: 


P(L|Q2) > PL|Qs) (3) 


Note, that the comparison is to be performed symbolically, that is, abstracted from 
the concrete value of the variables. We emphasise that this step can neither be 
avoided, nor easily moved to the axioms of the derivation, as in related approaches 
in the literature [19, 21-23, 28, 31,35]. We use Farkas’ Lemma in conjunction with 
two facts about the logarithm to linearise this symbolic comparison, namely the 
monotonicity of the logarithm and the fact that 2+log,(x)+logs(y) < 2 logs(a+ 
y) for all x,y > 1. For example, for the facts log,(|bl|) < log.(|bl| + |br|) and 
2 + logs(|bl|) + logs (|er| + |br|) < 2logs(\er| + |bl| + |br|), we use Farkas’ Lemma 
to generate the constraints 


2 3 
9(0,0,0,2) T 2f2 4(0,0,0,2) se 

5 3 Mo, 1,0,0) + f +9 2 %,1,0,0) 
9(1,0,1,0) T fè 4(1,0,1,0) 3 

2 3 o,1,1,0) — 92 %o,1,1,0) 
9(1,1,1,0) — 2f2 4(1,1,1,0) 


for some coefficients f, g > 0 introduced by Farkas’ Lemma. We note that Farkas’ 
Lemma can be interpreted as systematically exploring all positive-linear com- 
binations of the considered mathematical facts. This can be seen on the above 
example: one can combine g times the first fact with f times the second fact. 
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Next, we apply the rule for the let expression. This rule is the most involved 
typing rule in the system proposed by Hofmann et al. [27]. 


AQF e2:TIQ’-1 A\|RE* eg: T|R’ O]Q4t e3:T|Q’ 


let: T 
cr: T,b1:T, br: T|Q3 F let s = eg in e3: T|Q ue 


Ignoring the annotations and in particular the second premise for a moment, 
the type rule specifies a standard typing for a let expression. We note that, 
as required by the rule, all variables in the type context I’ occur at most once 
in the let-expression. I can then be split into contexts A := bl:T and O := 
cr: T,br:T. Here, €2 := splay a bl and e3 denotes the last match statement 
in e. The let-rule facilitates a splitting of the potential Q3 for the evaluation 
of e2 and e3 according to the type contexts A and O. Abusing notation, the 
distribution of potentials facilitated by the let-rule can be stated very roughly 
as two “equalities”, that is, (i) “Q3 = Q+R+P” and (ii) “Q4 = (Q'—1)+R' +P”. 
(i) states that the potential Q3 pays for evaluating the splay expression ez (with 
and without costs, requiring the potential Q and R) and leaves the remainder 
potential P. (ii) states that the potential Q4 is constituted of the remainder 
potential P and of the potentials left after evaluating e> (with and without 
costs, i.e. potentials Q’— 1 and R’). E.g. Q4 is given by the following constraints 


A 8 4 4 nae a 4 aa 

qi =d 43 = q, 9(1,0,0,0) = 4(1,0,0,0) 4%(4,1,1,0) = r'a) 
A 3 4 _ 73 4 a wo 

d2 = I %(0,1,0,0) = 4(0,0,1,0) %(1,1,0,0) = 4(1,0,1,0) 


where the coefficients q? stem from the remainder potential of Q3, the coefficient 
q, from Q’ — 1 and r'a) from R’. 

The most original part of this type rule is the second premise 
A|R + splay a bl: T|R’. Here, H° denotes the same kind of typing judge- 
ment as used in the overall typing derivation, but where all costs are set to zero 
(hence, the superscript cost-free). Let us assume R = [ra oj], R’ = [r’(1,0)], and 
that ATLAS was able to establish that 


P(bl: T|R) = log,(bl|) > logo(|s|) = G(s: TR’) , (4) 


establishing the coefficients r(1,9) = 1 and r’(1,9) = 1. (We note that cost-free 
typing derivations as in Eq. (4) constitute a size analysis that relates the sizes 
of input and output). Then, ATLAS infers from (4), taking advantage of the 
monotonicity of log, that 


logs(|er| + |bl| + |br|) > loga (|er| + |br| + |s]) - 


This inequality expresses that if the summand log,(|{cr|+ |bl|+|br|) is included in 
the potential &(I'|Qs3), then the summand log,(|cr| + |br|+ |s|) may be included 
in the potential &(cr: T, br: T, s: T|Q4). (The two logarithmic terms correspond 
to the coefficients qd, 1,1,0) and d4 1,1,0) marked in red above.) Thus, the cost-free 
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derivation allows the potential R to pass from Q3, via R’, to Q4. This is crucial 
for being able to pay for the evaluation of e3. 

The let-rule has the three premises A|Q F e2:T|Q’—1, AJR F£ e2:T|R’ 
and O|Q4 + e3:T|Q’. We focus here on the first premise and do not state the 
derivations for the other two premises (such derivations can be found in [27]). The 
judgement A|Q + splay a t:T|Q’—1 can be derived by the rule for function 
application, which states a cost of 1 with regard to the type signature of splay, 
represented by decrementing the potential induced by the annotation Q’. 


splay: T|Q > T|Q’ 


t: T|Q F splay a t:T|Q'— 1 (app) 


The rule for function application is an axiom, and closes this branch of the 
typing derivation. This concludes the presentation of the partial type inference 
given in Fig. 2. Similarly to the above example of splay, estimates for the amor- 
tised costs of insertion and deletion on splay trees can be automatically inferred 
by our tool ATLAS. Further, our analysis handles similar self-adjusting data 
structures like pairing heaps and splay heaps (see Sect. 6.1). 


3 Technical Foundation 


In this short section, we provide a more detailed account of the formal system 
underlying our tool ATLAS. We state the soundness of the system in Theorem 1. 

A typing context is a mapping from variables V to types; denoted by upper- 
case Greek letters. A program P is a set of typed function definitions of the form 
f(x1,..-,%n) = e, where the x; are variables and e an expression. A substitution 
(or an environment) o is a mapping from variables to values that respects types. 
Substitutions are denoted as sets of assignments: o = {x1 +> t1,...,%p > ty}. 
We employ a simple cost-sensitive big-step semantics based on eager evaluation, 
dressed up with cost assertions. The judgement o E e => v means that under 
environment g, expression e is evaluated to value v in exactly £ steps. Here 
only rule applications emit (unit) costs. For brevity, the formal definition of the 
semantics is omitted but can be found in [27]. 

In Sect.2, we introduced a variant of Schoenmakers’ potential function, 
denoted as rk(t), and the additional potential functions P(a;,...am,b) (t1; <- +5 tm) := 
logs(a1 - |t1|+---+Gm- |tm| +b), denoting the log, of a linear combination of tree 
sizes. logy denotes the logarithm to the base 2; throughout the paper we stipulate 
log, (0) := 0 in order to avoid case distinctions. Note that the constant function 1 
is representable: 1 = At. log, (0 - |t| + 2) = pio,2). We are now ready to state the 
resource annotation of a sequence of trees: 


Definition 1. A resource annotation or simple annotation of length m is a 
sequence Q = [q1,--+5 4m] U [Qam an, b) Jai ben], vanishing almost everywhere. 
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Let t1,...,tm be a sequence of trees. Then, the potential of t1,...,tm wrt. Q is 
given by 


P(t1,...,tm|Q) := oa -rk(ti) + 5 Qla,- amb) ` Plar,- amb) (EL, +++ tm) - 
i=l 
In case of an annotation of length 1, we sometimes write qx instead of q1, as 


we already did above. 


Example 1. Let t be a tree, then its potential could be defined as follows: rk(t) + 
3 - logs(|t]) + 1. Wrt. the above definition this potential becomes representable 
by setting qx := 1, 49(1,0) := 3, 9(0,2) := 1. Thus, ®(t|Q) = rk(t) + 3 - loga (|t|) + 1. 


Let o bea substitution, let denote a typing context and let 1: T,..., £m: T 
denote all tree types in I’. A resource annotation for I or simply annota- 
tion is an annotation for the sequence of trees £10,...,&m0o. We define the 
potential of the annotated context I'|Q wrt. a substitution o as (0o; T'|Q) := 
P(x10,...,Lmo|Q). 


Definition 2. An annotated signature F maps functions f to sets of pairs of 
the annotation type for the arguments and the annotation type of the result: 


F(f) := {a1 X +++ X an|Q > B\Q’: Q,Q' are annotations, Q is of length m}. 
We suppose f takes n arguments of which m are trees; m < n by definition. 


Instead of a; X--: X a,|Q — LIQ’ € F(f), we sometimes succinctly write 
fra, X- X an|Q — B\Q’. The cost-free signature, denoted as Ff, is similarly 
defined. 


Example 2. Consider the function splay from above. Its signature is formally 
represented as B x T|Q — T|Q’, where Q := [dx] U [(G(a,0))a,ven] and Q’ := 
[a] U [laka b))a ben]. We leave it to the reader to specify the coefficients in Q, Q 
so that the rule (app) as depicted in Sect. 2 can indeed by employed to type the 
recursive call of splay. 


Let Q = [q] U [(Qa,0))a,ben] be an annotation such that qa) > 0. Then 
Q’ := Q-1 is defined as follows: Q’ = la JUNa b) Jaben], where 0,2) = q(0,2)— 1 
and for all (a,b) # (0,2) dab) := (a,b): By definition the annotation coefficient 
0,2) is the coefficient of the basic potential function p(o,2) (t) = log, (0|t|+2) = 1, 
so the annotation Q — 1, decrements cost 1 from the potential induced by Q. 


Type-and-Effect System. The typing system makes use of a cost-free semantics, 
which does not attribute any costs to the calculation. I.e. the rule (app) (Sect. 2) 
is changed so that no cost is emitted. The cost-free application rule is denoted 
as (app: cf). The cost-free typing judgement is written as T'|Q +“ e: a|Q’. The 
judgement I'|Q F e:alQ’ is governed by a plethora of typing rules. We have 
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illustrated several typing rules in Sect. 2 (the complete set of typing rules can be 
found in [27]). 

A program P is called well-typed if for any rule f(a1,...,2,) = e € P and any 
annotated signature f: a1 X-+- X ak|Q > BQ’, we have z1:0@1,..., £k: @k|Q F 
e: |Q. A program P is called cost-free well-typed, if the cost-free typing relation 
is employed. 

Hofmann et al. establish the following soundness result:% 


Theorem 1 (Soundness Theorem). Let P be well-typed and let o be an envi- 
ronment. Suppose T|Q > e:alQ’ anda 5 e => v. Then (o; r|Q)— P(v|Q’) > £ 
Further, if TQ e:alQ’, then (o;T|Q) > BQ’). 


4 The Road to Automation, Continued 


The above sketched type-and-effect system, originally proposed in [27], is only a 
first step towards full automation. Several challenges need to be overcome, which 
we detail in this section. 


4.1 Type Checking 


Comparison between logarithmic expressions, constitutes a first major challenge, 
as such a comparison cannot be directly encoded as a linear constraint problem. 
To achieve such linearisation, [27] makes use of the following: (i) a subtly and 
surprisingly effective variant of Schoenmakers potential (see Sect.2); (ii) math- 
ematical facts about the logarithm function—like Lemma 1 below—referred to 
as expert knowledge; and finally (iii) Farkas’ Lemma for turning the universally- 
quantified premise of the weakening rule into an existentially-quantified state- 
ment that can be added to the constraint system—see Lemma 2. 

A simple mathematical fact that is employed by Hofmann et al.— following 
earlier pen-and-paper proofs in the literature [37,38,41]—states as follows: 


Lemma 1. Let x,y > 1. Then 2+ logs(x) + logs(y) < 2logs(a + y). 


We remark that our automated analysis shows that this lemma is not only 
crucial in the analysis of splaying, but also for the other data structures we have 
investigated. Further, Hofmann et al. state and prove the following variant of 
Farkas’ Lemma, which lies at the heart of an effective transformation of com- 
parison demands like (3) into a linear constraint problem. Note that @ and f 
denote column vectors of suitable length. 


is solvable. Then the 
=> uz < X and (ii) 


Lemma 2 (Farkas’ Lemma). Suppose AT < bz > 
following assertions are equivalent. (i) VZ > 0. AZ < 
gf > 0. a < fFrAnfre<n. 


3 Note that soundness assumes a terminating execution o = e => v of P. We point out 
that our analysis does not guarantee the termination of P for all environments ø. 
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The lemma allows the assumption of expert knowledge through the assump- 
tion Av < b for all z > 0. E.g., thus formalised expert knowledge is a clear 
point of departure for additional information. E.g. Hofmann et al. [27] propose 
the following potential extensions: (i) additional mathematical facts on the log 
function; (ii) a dedicated size analysis; (iii) incorporation of basic static analysis 
techniques. The incorporation of Farkas’ Lemma with suitable expert knowledge 
is already essential for type checking, whenever the symbolic weakening rule (3) 
needs to be discharged. 

ATLAS incorporates two facts into the expert knowledge: Lemma 2 and the 
monotonicity of the logarithm (see Sect.5). We found these two facts to be 
sufficient for handling our benchmarks, i.e. expert knowledge of form (ii) and 
(iii) was not needed. (We note though that we have experimented with adding a 
dedicated size analysis (ii), which interestingly increased the solver performance, 
despite generating a large constraint system). 

We indicate how ATLAS may be used to solve the constraints generated 
for the example in Sect.2. We recall the crucial application of the weakening 
step between annotations Q2 and Q3. This weakening step can be automatically 
discharged using the monotonicity of logs and Lemma 1. (More precisely, ATLAS 
employs the mode w{mono 12xy} see, Sect. 5.) For example, ATLAS is able to 
verify the validity of the following concrete constants: 


Qo: =G=G=1 Qs: =G=G=1 
G(0,0,0,2) =1 G(0,1,1,0) =1 G(0,0,0,2) =2 91,0,0,0) =1 
G(0,0,1,0) =1 G(1,0,0,0) =1 G(0,0,1,0) =1 91,0,1,0) =1 
G(0,1,0,0) =1 G(1,1,1,0) =3 G(o,1,0,0) =3 91,1,1,0) =1 


4.2 Type Inference 


We extend the type-and-effect system of [27] from type checking to type infer- 
ence. Further, we automate the application of structural rules like sharing or 
weakening, which have so far required user guidance. 

The two central contributions of this paper, as delineated in the introduction, 
are based on significant improvement over the state-of-the-art as described above. 
Concretely, they came about by a novel (i) optimisation layer; (ii) a careful 
control of the structural rules; (iii) the generalisation of user-defined proof tactics 
into an overall strategy of type inference; and (iv) provision of an automated 
amortised analysis in the sense of Sleator and Tarjan. In the sequel of the section, 
we will discuss these stepping stones towards full automation in more details. 


Optimisation Layer. We add an optimisation layer to the set-up, in order to 
support type inference. This allows for the inference of (optimal) type annota- 
tions based on user-defined type annotations. For example, assume the user- 
provided type annotation 'k(¢)+3log,(|¢|)+1 — rk(splay(t)) can in principle 
be checked automatically. Then—instead of checking this annotation—ATLAS 
automatically optimises the signature, by minimising the deduced coefficients. 
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(match (* t *) leaf 
(match (* cl *) ? 
(w{l2xy} (let:tree:cf (* s *) 


app (* splay_eq a bl *) 
(match leaf 
(let:tree:cf node (let:tree:cf node (w{mono} node)))))))) 


Qakwoner 


Fig. 3. Tactic that matches the zig-zig case of splay as shown in Fig. 1. 


(In Sect.5 we discuss how this optimisation step is performed.) That is, ATLAS 
reports the following annotation 


splay: !/ark(t) + 3/2 log,(|t|) > 1/2 rk(splay(t)) , 


which yields the optimal amortised cost of splaying of 3/2log,(|t|). Optimality 
here means that no better bound has been obtained by earlier pen-and-paper 
verification methods (compare the discussion in Sect. 1). 


Structural Rules. We observed that an unchecked application of the structural 
rules, that is of the sharing and the weakening rule, quickly leads to an explosion 
of the size of the constraint system and thus to de-facto unsolvable problems. To 
wit, an earlier version of our implementation ran continuously for 24/7 without 
being able to infer a type for the complete definition of the function splay.* 

The type-and-effect system proposed by Hofmann et al. is in principle linear, 
that is, variables occur at most once in the function body. For example, this is 
employed in the definition of the let-rule, cf. Sect. 2. However, a sharing rule is 
admissible, that allows to treat multiple occurrences of variables. Occurrences 
of non-linear variables are suitably renamed apart and the carried potential is 
shared among the variants. (See [27] for the details.) The number of variables 
strongly influences the size of the constraint problem. Hence, eager application 
of the sharing rule proved infeasible. Instead, we restricted its application to 
individual program traces. For the considered benchmark examples, this removed 
the need for sharing altogether. 

With respect to weakening, a careful application of the weakening rule proved 
necessary for performance reasons: First, we apply weakening only selectively. 
Second, when applying weakening, we employ different levels of granularity. We 
may only perform a simple coefficient comparison, or we may apply monotonicity 
or Lemma 1 or both in conjunction with Farkas’ Lemma. We give the details in 
Sect. 5. 


Proof Tactics. Hofmann et al. [27] already propose user-defined proof plans, 
so-called tactics, to improve the effectivity of type checking. In combination 
with our optimisation framework, tactics allow to significantly improve type 
annotations. To wit, ATLAS can be invoked with user-defined resource annota- 
tions for the function splay, representing its “standard” amortised complexity 
(e.g. copied from Okasaki’s book [38]) and an easily definable tactic, cf. Fig. 3. 


4 The code ran single-threaded on AMD®) Ryzen 7 3800 @ 3.90 GHz. 
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Then, ATLAS automatically derives the optimal bound reported above. Still, for 
full-automation tactics are clearly not sufficient. In order to obtain type inference 
in general, we developed a generalisation of all the tactics that proved useful on 
our benchmark and incorporated this proof search strategy into the type infer- 
ence algorithm. Using this, the aforementioned (unsuccessful) week-long quest 
for a type inference of splaying can now be successfully answered (in an optimal 
form) in mere minutes. 

We'd like to argue that ATLAS proof search strategy for full automation is 
free of bias towards the provided complexity analysis. As detailed in Sect. 5, the 
heuristics incorporates common design principles of the data structures analysed. 
Thus, we exploit recurring patterns in the input (destructuring of input trees, 
handling base/recursive cases, rotations) not in the solution. The situation is 
similar to the choice of the potential functions, which we expect to generalise to 
other data structures. Similarly, we expect generalisability of the current proof 
search strategy. 


Automated Amortised Analysis. In Sect.2, we provided a high-level introduc- 
tion into the potential method and remarked that Sleator and Tarjan’s original 
formulation is re-obtained, if the corresponding potential functions are defined 
such that ¢(v) := ay(v) + W(x), see page 5. We now discuss how we can extract 
amortised complexities in the sense of Sleator and Tarjan from our approach. 
Suppose, we are interested in an amortised analysis of splay heaps. Then, it suf- 
fices to equate the right-hand sides of the annotated signatures of the splay heap 
functions. That is, we set del_min: T|Q; — T|Q’, insert: B x T|Q2 — T|Q’ 
and partition: B x T|Qs — T|Q’ for some unknown resource annotations 
Q1, Q2, Q3, Q’. Note that we use the same annotation Q’ for all signatures. We 
can then obtain a potential function from the annotation Q’ in the sense of 
Sleator and Tarjan and deduce Q; — Q’ as an upper bound on the amortised 
complexity of the respective function. In Sect.5, we discuss how to automati- 
cally optimise Q; — Q’ in order to minimise the amortised complexity bound. 
This automated minimisation is the second major contribution of our work. Our 
results suggest a new approach for the complexity analysis of data structures. 
On the one hand, we obtain novel insights into the automated worst-case run- 
time complexity analysis of involved programs. On the other hand, we provide 
a proof-of-concept of a computer-aided analysis of amortised complexities of 
data-structures that so far have only been analysed manually. 


5 Implementation 


In this section, we present our tool ATLAS, which implements type inference for 
the type system presented in Sects. 2 and 3. ATLAS operates in three phases: 


1.) Preprocessing, ATLAS parses and normalises the input program; 

2.) Generation of the Constraint System, ATLAS extracts constraints from the 
normalised program according to the typing rules (as sketched in Sect. 2); 

3.) Solving, the derived constraint system is handed to an optimising constraint 
solver and the solver output is converted into a type annotation. 


114 L. Leutgeb et al. 


let x1 = a<a’ in if x1 
then LNF[(1,a, (leaf,a’,r))] 


1 |LNE[if axa’ 
2 then (l,a, (leaf, a’ ;r)) 
3 else ((1,a’,leaf),a,r)] 


wo ne 


else LNF[((1,a’,leaf),a,r)] 


1 |let x1 = a < a’ in if xl 
2 then let x2 = leaf in let x3 = (x2,a’,r) in (l,a,x3) 
3 else let x4 leaf in let x5 (l,a’,x4) in (x5, .a,7) 


Fig. 4. Preprocessing: let normal forms. 


In terms of overall resource requirements, the bottleneck of the system is phase 
three. Preprocessing is both simple and fast. While the code implementing con- 
straint generation might be complex, its execution is fast. All of the underlying 
complexity is shifted into the third phase. On modern machines with multiple 
gibibytes of main memory, ATLAS is constrained by the CPU, and not by the 
available memory. In the remainder of this section, we first detail these phases of 
ATLAS. We then go into more details of the second phase. Finally, we elaborate 
the optimisation function which is the key enabler of type inference. 


5.1 The Three Phases of ATLAS 


1.) Preprocessing. The parser used in the first phase is generated with ANTLR° 
and transformation of the syntax is implemented in Java. The preprocessing 
performs two tasks: (i) Transformation of the input program into let-normal- 
form, which is the form of program input required by our type system. (ii) The 
unsharing conversion creates explicit copies for variables that are used multiple 
times. Making multiple uses of a variables explicit is required by the let-rule of 
the type system. 

In order to satisfy the requirement of the let-rule, it is actually sufficient 
to track variable usage on the level of program paths. It turns out that in our 
benchmarks variables are only used multiple times in different branches of an 
if-statement, for which no unsharing conversion is needed. Hence, we do not 
discuss the unsharing conversion further in this paper and refer the interested 
reader to [27] for more details. 


Let-Normal-Form Conversion. The let-normal-form conversion is performed 
recursively and rewrites composed expressions into simple expressions, where 
each operator is only applied to a variable or a constant. This conversion is 
achieved by introducing additional let-constructs. We exemplify let-normal-form 
conversion on a code snippet in Fig. 4. 


2.) Generation of the Constraint System. After preprocessing, we apply the typ- 
ing rules. Importantly, the application of all typing rules, except for the weaken- 
ing rule, which we discuss in further detail below, is syntaz-directed: This means 


5 See antlr.org. 
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that each node of the AST of the input program dictates which typing rule is 
to be applied. The weakening rule could in principle be applied at each AST 
node, giving the constraint solver more freedom to find a solution. This degree 
of freedom needs to be controlled by the tool designer. In addition, recall that the 
suggested implementation of the weakening rule (see Sect. 4.1) is to be parame- 
terised by the expert knowledge, fed into the weakening rule. In our experiments 
we noticed that the weakening rule has to be applied sparingly in order to avoid 
an explosion of the resulting constraint system. 

We summarise the degrees of freedom available to the tool designer, which can 
be specified as parameters to ATLAS on source level. 1.) The selected template 
potential functions, i.e. the family of indices a,b for which coefficients q(a,b) are 
generated (we assume not explicitly generated are set to zero). 2.) The number 
of annotated signatures (with costs and without costs) for each function. 3.) The 
policy for applying the (parameterised) weakening rule. 

We detail our choices for instantiating the above degrees of freedom in 
Sect. 5.2. 


3.) Solving. For solving the generated constraint system, we rely on the Z3 
SMT solver. We employ Z3’s Java bindings, load Z3 as a shared library, and 
exchange constraints for solutions. ATLAS forwards user-supplied configuration 
to Z3, which allows for flexible tuning of solver parameters. We also record 
Z3’s statistics, most importantly memory usage. During the implementation 
of ATLAS, Z3’s feature to extract unsatisfiable cores has proven valuable. It 
supplied us with many counterexamples, often directly pinpointing bugs in our 
implementation. The tool exports constraint systems in SMT-LIB format to the 
file system. This way, solutions could be cross-checked by re-computing them 
with other SMT solvers that support minimisation, such as OptiMathSAT [43]. 


5.2 Details on the Generation of the Constraint System 


We now discuss our choices for the aforementioned degrees of freedom. 


Potential Function Templates. Following [27], we create for each node in the AST 
of the considered input program, where n variables of tree-type are currently in 
context, the coefficients q1,...,qn for the rank functions and the coefficients 
qa») for the logarithmic terms, where @ € {0,1}" and b € {0,2}. This choice 
turned out to be sufficient in our experiments. 


Number of Function Signatures. We fix ©(1,1,2) 
the number of annotations for each we | ke 
function f Og xX +++ xX An|Q =y BIQ to T(0,1,2) T(1,0,2) T(1,1,0) 
one regular and one cost-free signature. | K, | 
This was sufficient for our experiments. £(0,0,2) (0,1,0)  &(1,0,0) 
Weakening. We need to discharge sym- ™ | a 

0,0,0 


bolic comparisons of form @(I'|P) < 


P(TIQ). As indicated in Sect.4, we Fig. 5. Monotonicity Lattice for |Q| = 2. 
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employ Farkas’ Lemma to derive constraints for the weakening rule. For con- 
text I = t1,...,tn, we introduce variables xg) where @ € {0,1}",b € {0,2}, 
which represent the potential functions pig) = loge (ai|ti| +... + anltn| + b). 
Next, we explain how the monotonicity of log, and Lemma 1 can be used to 
derive inequalities on the variables x(g,4), which can then be used to instantiate 
matrix A in Farkas’ Lemma as stated in Sect. 4. 


Monotonicity. We observe that pia») = loge(ailti| +... + an|tn| + b) < 
logo(a;|é1| +... + ahltn| + 6’) = pao, if ar < a},...,¢n S a, and b < V. 
This allows us to obtain the lattice shown in Fig.5. A path from z(a») to £(a,b) 
signifies (a4) < Lap) resp. Lay) — Ta) < 0, represented by a row with 
coefficients 1 and —1 in the corresponding columns of matrix A. 


Mathematical Facts, Like Lemma 1. For an annotated context of length 2, 
Lemma 1 can be stated by the inequality 22 (9 92) +2(0,1,0) +%(1,0,0) -2%(1,1,0) < 0; 
we add a corresponding row with coefficients 2,1,1,—2 to the matrix A. Like- 
wise, for contexts of length > 2, we add, for each subset of 2 variables, a row 
with coefficients 2,1,1,—2, setting the coefficients of all other variables to 0. 


Sparse Expert Knowledge Matriz. We observe for both kinds of constraints that 
matrix A is sparse. We exploit this in our implementation and only store non-zero 
coefficients. 


Parametrisation of Weakening. Each applications of the weakening rule is param- 
eterised by the matrix A. In our tool, we instantiate A with either the constraints 
for (i) monotonicity, shortly referenced as w{mono}; (ii) Lemma 1 (w{12xy}); (iii) 
both (w{mono 12xy}); or (iv) none of the constraints (w). 

In the last case, Farkas’ Lemma is not needed because weakening defaults to 
point-wise comparison of the coefficients p(z»), which can be implemented more 
directly. Each time we apply weakening, we need to choose how to instantiate 
matrix A. Our experiments demonstrate that we need to apply monotonicity 
and Lemma 1 sparingly in order to avoid blowing up the constraint system. 


Tactics and Automation. ATLAS supports manually applying the weakening 
rule—for this the user has to provide a tactic—and a fully-automated mode. 


Naive Automation. Our first attempt to automation applied the weakening rule 
everywhere instantiated with the full amount of available expert knowledge. This 
approach did not scale. 


Manual Mode via Tactics. A tactic is given as a text file that contains a tree of 
rule names corresponding to the AST nodes of the input program, into which 
the user can insert applications of the weakening rule, parameterised by the 
expert knowledge which should be applied. A simple tactic is depicted in Fig. 3. 
Tactics are distributed with ATLAS, see [32]. The user can name sub-trees for 
reference in the result of the analysis and include ML-style comments in the 
tactics text. We provide two special commands that allow the user to directly 
deal with a whole branch of the input program: The question mark (7?) allows 
partial proofs; no constraints will be created for the part of the program thus 
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marked. The underscore (_) switches to the naive automation of ATLAS and 
will apply the weakening rule with full expert knowledge everywhere. Both, ? 
and _, were invaluable when developing and debugging the automated mode. We 
note that the manual mode still achieves solving times that are by a magnitude 
faster than the automated mode, which may be of interest to a user willing to 
hand-optimise solving times. 


Automated Mode. For automation, we extracted common patterns from the tac- 
tics we developed manually: Weakening with mode w{mono} is applied before 
(var) and (leaf), w{mono 12xy} is applied only before (app). (We recall that the 
full set of rules employed by our analysis can be found in [27].) Further, for AST 
subtrees that construct trees, i.e. which only consist of (node), (var) and (leaf) 
rule applications, we apply w{mono} for each inner node, and w{12xy} for each 
outermost node. For all other cases, no weakening is applied. This approach is 
sufficient to cover all benchmarks, with further improvements possible. 


5.3 Optimisation 


Given an annotated function f: a1 X :-: X an|Q —> 8/Q’, we want to find val- 
ues for the coefficients of the resource annotations Q and Q’ that minimise 
(TrIQ) — S(T|Q'), since this difference is an upper bound on the amortised 
cost of f, cf. Sect. 4.2. However, as with weakening, we cannot directly express 
such a minimisation, and again resort to linearisation: We choose an optimisa- 
tion function that directly maps from Q and Q’ to Q. Our optimisation function 
combines four measures, three of which involve a difference between coefficients 
of Q and Q’, and a fourth one that only involves coefficients from Q in order 
to minimise the absolute values of the discovered coefficients. We first present 
these measures for the special case of |Q| = 1. 

The first measure di(Q,Q’) := qx — q, reflects our goal of preserving the 
coefficient for rk; note that for d,(Q,Q’) Æ 0, the resulting complexity bound 
would be super-logarithmic. The second measure d2(Q,Q’) := D (a,b) (a,b) — 
a,b)) -w(a, b) reflects the goal of achieving logarithmic bounds that are as small 
as possible. Weights are defined to penalise more complex terms, and to exclude 
constants. (Recall that 1 is representable as loga (0 + 2).) We set 


(ab) = £0 for (a; 8) = (0:2) 
w(a,b) := 
(a+(b+1)?)?, otherwise. 


The third measure d3(Q,Q’) := 40,23) — 0,2) reflects the goal of minimising 
constant cost. Lastly, we set d4(Q, Q’) := Diab) {a,b) in order to obtain small 
absolute numbers. The last measure does not influence bounds on the amortised 
cost, but leads to more beautiful solutions. These measures are then composed to 
the linear objective function min Pa d;(Q, Q') - wi. In our implementation, we 
set w; = [16127, 997, 97, 2]; these weights are chosen (almost) arbitrary, we only 
noticed that wı must be sufficiently large to guarantee its priority. (We note that 
these weights were sufficient for our experiments; we refer to the literature for 
more principled ways of choosing the weights of an aggregated cost function [34].) 
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Multiple Arguments. For |Q| > 1, we set dy := yal qi — q, and d2(Q, Q’) := 
Diaa "y b) (Qa,a,...,b) — a,b)) - w(a,b). The required changes for da and d4 are 
straight-forward. In our benchmarks, there is only one function ( merge of pairing 
heaps) that requires this minimisation function. 


6 Evaluation 


We first describe the benchmark functions employed to evaluate ATLAS and then 
detail this experimental evaluation, already depicted in Table 1. 


6.1 Automated Analysis of Splaying et al. 


Splay Trees. Introduced by Sleator and Tarjan [47,49], splay trees are self- 
adjusting binary search trees with strictly increasing in-order traversal, but with- 
out an explicit balancing condition. Based on splaying, searching is performed 
by splaying with the sought element and comparing to the root of the result. 
Similarly, insertion and deletion are based on splaying. Above we used the zig-zig 
case of splaying, depicted in Fig. 1 as motivating code example. While the pen- 
and-paper analysis of this case is the most involved, type inference for this case 
alone did not directly yield the desired automation of the complete definition. 
Rather, full automation required substantial implementation effort, as detailed 
in Sect. 5. As already emphasised, it came as a surprise to us that our tool ATLAS 
is able match up and partly improve upon the sophisticated optimisations per- 
formed by Schoenmakers [41,42]. This seems to be evidence of the versatility 
of the employed potential functions. Further, we leverage the sophistication of 
our optimisation layer in conjunction with the current power of state-of-the-art 
constraint solvers, like Z3 [36]. 


Splay Heaps. To overcome deficiencies of splay trees when implemented func- 
tionally, Okasaki introduced splay heaps. Splay heaps are defined similarly to 
splay trees and their (manual) amortised cost analysis follows similar patterns 
as the one for splay trees. Due to the similarity in the definitions between splay 
heaps and splay trees, extension of our experimental results in this direction 
did not pose any problems. Notably, however, ATLAS improves the known com- 
plexity bounds on the amortised complexity for the functions studied. We also 
remark that typical assumptions made in pen-and-paper proofs are automati- 
cally discharged by our approach: Schoenmakers [41,42] as well as Nipkow and 
Brinkop [37] make use of the (obvious) fact that the size of the resulting tree t’ 
or heap h’ equals the size of the input. As discussed, this information is captured 
by a cost-free derivation, cf. Sect. 2. 


Pairing Heaps. These are another implementation of heaps, which are rep- 
resented as binary trees, subject to the invariant that they are either leaf, 
or the right child is leaf, respectively. The left child is conceivable as list 
of pairing heaps. Schoenmakers and Nipkow et al. provide a (semi-)manual 
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: Proof|| automated automated 
Function i : manual 

(w) (naive) (improved) 
ST:splay (zig-zig) Selective n/a 7718 18S | 2552 <1S 
` All 11792 45S | 9984 19S | 2864 <I1S 
ST. splay Selective n/a 42095 8MI1S 19111 12S 
` All 68103 t/o 24H [54377 14M19S |23323 1M27S 
ai partition Selective n/a 33729 7M9S |15213 6S 
` Al 51995 t/o 24H |43549 15M25S |16829 10S 
Pi.mergepairs Selective n/a 25860 1M3S | 6414 <I1S 
: = All 43515 t/o 24H [34918 13M41S | 6558 = <1S 


(a) Comparison of the number of constraints generated and time taken for the type 
inference of the core operation of each benchmark plus the zig-zig case of splay. 


Module automated manual 

Assertions Time Memory Assertions Time Memory 
ST 54794 24M17S 3204 24677 43S 280 
SH 37911  7M35S 1482 17877 12S 237 
PH 29493  3M42S 760 7987 1S 29 


(b) Number of assertions, solving time and maximum memory usage (in mebibytes) 
for the combined analysis of functions per-module. 


analysis of pairing heaps, that ATLAS can verify or even improve fully- 
automatically. We note that we analyse a single function merge_pairs, whereas 
[37] breaks down the analysis and studies two functions pass_1 and pass_2 with 
merge_pairs = pass_2 o pass_1. All definitions can be found at [33]. 


6.2 Experimental Results 


Our main results have already been stated in Table 1 of Sect. 1. Table 2a com- 
pares the differences between the “naive automation” and our actual automation 
(“automated mode”), see Sect. 5. Within the latter, we distinguish between a 
“selective” and a “full” mode. The “selective” mode is as described on page 18. 
The “full” mode employs weakening for the same rule applications as the “selec- 
tive” mode, but always with option w{mono 12xy}. The same applies to the 
“full” manual mode. The naive automation does not support selection of expert 
knowledge. Thus the “selective” option is not available, denoted as “n/a”. Time- 
outs are denoted by “t/o”. As depicted in the table, the naive automation does 
not terminate within 24h for the core operations of the three considered data 
structures, whereas the improved automated mode produces optimised results 
within minutes. In Table 2b, we compare the (improved) automated mode with 
the manual mode, and report on the sizes of the resulting constraint system 
and on the resources required to produce the same results. Observe that even 
though our automated mode achieves reasonable solving times, there is still a 
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significant gap between the manually crafted tactics and the automated mode, 
which invites future work. 


7 Conclusion 


In this paper we have for the first time been able to automatically conduct an 
amortised analysis for self-adjusting data structures. Our analysis is based on 
the “sum of logarithms” potential function and we have been able to automate 
reasoning about these potential functions by using Farkas’ Lemma for the linear 
part of the calculations and adding necessary facts about the logarithm. Imme- 
diate future work is concerned with replacing the “sum of logarithms” potential 
function in order to analyse skew heaps and Fibonacci heaps [42]. In particu- 
lar, the potential function for skew heaps, which counts “right heavy” nodes, is 
interesting, because it is also used as a building block by Iacono in his improved 
analysis of pairing heaps [29,30]. Further, we envision to extend our analysis to 
related probabilistic settings such as priority queues [13] and skip lists [40]. 
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Abstract. This paper presents a symbolic method for automatic theo- 
rem generation based on deductive inference. Many software verification 
and reasoning tasks require proving complex logical properties; coping 
with this complexity is generally done by declaring and proving relevant 
sub-properties. This gives rise to the challenge of discovering useful sub- 
properties that can assist the automated proof process. This is known 
as the theory exploration problem, and so far, predominant solutions 
that emerged rely on evaluation using concrete values. This limits the 
applicability of these theory exploration techniques to complex programs 
and properties. 

In this work, we introduce a new symbolic technique for theory explo- 
ration, capable of (offline) generation of a library of lemmas from a 
base set of inductive data types and recursive definitions. Our approach 
introduces a new method for using abstraction to overcome the above 
limitations, combining it with deductive synthesis to reason about 
abstract values. Our implementation has shown to find more lemmas 
than prior art, avoiding redundant lemmas (in terms of provability), 
while being faster in most cases. This new abstraction-based theory 
exploration method is a step toward applying theory exploration to soft- 
ware verification and synthesis. 


Keywords: Theory exploration - Synthesis - Automatic theorem 
proving 


1 Introduction 


Most forms of software verification and synthesis rely on some form of logical rea- 
soning to complete their task. Whether it is checking pre- and post-conditions, 
deriving specifications for sub-problems [1,19], or equivalence reduction [39], 
these methods rely on assumptions from both the input and relevant background 
knowledge. Domain-specific knowledge can reinforce these methods, whether via 
the design of a domain-specific language [29,36,45], specialized decision proce- 
dures [28], or decomposing specifications [35]. While hand-crafted techniques can 
treat whole classes of programs, every library or module contributes a collection 
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of new primitives, requiring tweaking or extending these methods. Automatic for- 
mation of background knowledge can enable effortless treatment of such libraries 
and programs. 

In the context of verification tools, such as Dafny [27] and Leon [7], as well 
as interactive proof assistants, such as Coq [12] and Isabelle/HOL [33], back- 
ground knowledge is typically given as a set of lemmas. Usually, these libraries 
of lemmas (i.e. the background knowledge) are created by human engineers and 
researchers who are tasked with formulating them and proving their correctness. 
When a proof or verification task requires auxiliary lemmas missing from the 
existing background knowledge, the user is required to add and prove it, some- 
times repeating this process until the proof is trivial or can be found automati- 
cally. For example, both Dafny and Leon fail to prove that addition is associative 
and commutative from first principles—based on an algebraic construction of 
the natural numbers. However, when given knowledge of these properties (i.e. 
encoded as lemmas: (x + y) + z = x + (y + z) and z +y = y + x)!, they readily 
prove composite facts such as (x +5) +y = 5+ (£ +y). 

A possible solution is to eagerly generate valid lemmas, and to do so automat- 
ically, offline, as a precursor to any work that would be built on top of the library. 
This paradigm is known as theory exploration [8,9], and differs from the com- 
mon conjecture generation approach (in theorem provers and SMT solvers [37]) 
that is guided by a proof goal. As opposed to using proof goal as the basis for 
discovering sub-goals, when eagerly generating lemmas there is a vast space of 
possible lemmas to consider. Currently, two main approaches exist for filter- 
ing candidate conjectures, counterexample-based and observational equivalence- 
based [18, 22, 23,43]. These filtering techniques are all based on testing and there- 
fore require automatic creation of concrete examples. 

Testing with concrete values allows for fast evaluation and filtering of terms 
when the data types involved are simple. However, when scaling to larger data 
types and function types it becomes a bottleneck of the theory exploration pro- 
cess. Previous research effort has revealed that testing-based discovery is sen- 
sitive to the number and size of type definitions occurring in the code base. 
For example, QuickSpec, which is based on QuickCheck (as are all the existing 
testing-based theory exploration methods), employs a heuristic to restrict the set 
of types allowed in terms in order to make the checker’s job easier. Compound 
data types such as lists can be nested up to two levels (lists of lists, but not lists 
of lists of lists). This presents an obstacle towards scaling the approach to real 
software libraries, since “QuickCheck’s size control interacts badly with deeply 
nested types [...] will generate extremely large test data.” [38] 

Following are two example scenarios that attempt to represent cases from 
software systems where structured data types and complicated APIs exist: (i) A 
series of tree data-types T; where each T; is a tree of height i with i children of 
type T;_1, and the base case is an empty tree. Creating concrete examples for T; 
will be resource expensive, as each tree has O(i!) nodes, and each node requires a 


1 In fact, these properties are hard-wired into decision procedures for linear integer 
arithmetic in SMT solvers. 
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value. (ii) An ADT (Algebraic Data Type) A with multiple fields where each can 
contain a large amount of text or other ADTs, and a function over A that only 
accesses one of the fields. Even if evaluating the function is fast, fully creating 
A is expensive and will impact the theory exploration run-time. 

This paper presents a new symbolic theory exploration approach that takes 
advantage of the characteristics of induction-based proofs. To overcome the 
blowup in the space of possible values, we make use of symbolic values, which con- 
tain interpreted symbols, uninterpreted symbols, or a mixture of the two. Con- 
ceptually, each symbolic value is an abstraction representing (infinitely) many 
possible values. This means that preexisting knowledge on the symbolic value 
can be applied without fully creating interpreted values. Still, when necessary, 
uninterpreted values can be expanded, creating larger symbolic values, thus refin- 
ing the abstraction, and facilitating the necessary computation. We focus on the 
formation of equational theories, that is, lemmas that curtail the equivalence of 
two terms, with universal quantification over all free variables. 

We show that our symbolic method for theory exploration is more applicable 
and faster in many different scenarios than state-of-the-art. As an example, given 
standard definitions for the list functions: ++ drop take filter our method proves 
facts that were not found by current state-of-the-art such as: 


(take i zs) ++ (drop i zs) = as 
filter p (as ++ ys) = (filter p vs) ++ (filter p ys) 


Main Contributions. This paper provides the following contributions: 


— A system for theory synthesis using symbolic values to take advantage of 
value abstraction. Our implementation, TheSy, can discover more lemmas 
than were found by testing-based tools, while being faster in most cases. 

— A technique to compare universally quantified terms using term rewriting 
techniques and a given set of lemmas, called symbolic observational equiva- 
lence (SOE). SOE overapproximates term equalities deducible by the given 
lemmas (i.e., will find more equalities), thus can be used for equality reduc- 
tion in context of uninterpreted values, enabling fully symbolic reasoning over 
a large set of terms. 

— An evaluation of our theory exploration system on a set of benchmarks for 
induction proofs taken from CVC4 [37] and TIP 2015 [11], specifically the 
IsaPlanner benchmarks [21]. We compare our implementation with a current 
leading theory exploration system, Hipster [18], using a novel metric. This 
metric is insensitive to the amount of found lemmas, but rather measures 
their usefulness in the context of theorem proving. 


2 Overview 


Our theory exploration method, named TheSy (Theory Synthesizer, pronounced 
Tessy), is based on syntax-guided enumerative synthesis. Similarly to previ- 
ous approaches [10,20,38], TheSy generates a comprehensive set of terms from 
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iterative deepening 


Term Conjecture Conjecture Induction 
base — Generation -—-+| Inference }—+ Screening -—> Prover > new 
knowledge | (SyGuE) (SOE) (cong. closure) knowledge 
F F F 


augment knowledge 


Fig. 1. TheSy system overview: breakdown into phases, with feedback loop. 


the given vocabulary and looks for pairs that seem equivalent. Notably, TheSy 
employs deductive reasoning based on term rewriting systems to propose these 
pairs by extrapolating from a set of known equalities, employing a relatively 
lightweight (but unsound) reasoning procedure. The proposed pairs are passed 
as equality conjectures to a theorem prover capable of reasoning by induction. 

The process (as shown in Fig. 1) is separated into four stages. These stages 
work in an iterative deepening fashion and are dependent on the results of each 
other. A short description is given to help the reader understand their context 
later on. 


1. Term Generation. Build symbolic terms of increasing depth, based on the 
given vocabulary. Use known equalities for pruning via equivalence reduction. 

2. Conjecture Inference. Evaluate terms on symbolic inputs, and apply 
deductive inference to extract new equalities, thus forming conjectures. 

3. Conjecture Screening. Some of the conjectures, even valid ones, are special 
cases of known equalities or are trivially implied by them. We deem these con- 
jectures redundant. TheSy culls such conjectures before continuing to prove 
the rest. 

4. Induction Prover. The prover attempts to prove conjectures that passed 
screening using a normal induction scheme derived from algebraic data struc- 
ture definitions in the given vocabulary. Conjectures that were successfully 
proven are then declared lemmas and added to the known equalities. 


The phases are run iteratively in a loop, where each iteration deepens the 
generated terms and, hence, the discovered lemmas. These lemmas are fed back 
to earlier phases; this form of feedback contributes to discovering more lemmas 
thanks to several factors: 


(i) Conjecture inference is dependent upon known equalities. Additional equal- 
ities enable finding new conjectures. 

(ii) Accurate screening by merging equivalence classes based on known equali- 
ties. 

(iii) The prover is based on known equalities with a congruence closure proce- 
dure. The more lemmas are known to the system, the more lemmas become 
provable by this method. 

(iv) Term generation benefits from the equivalence reduction, avoiding duplicate 
work for equivalent terms. 
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y= 4[] list T, C = {hu} 
i T > list T — list T, 
++ list T > list T —> list T, 
filter (T — bool) — list T — list T } 


E = {[] +] =], (x::xs)++l = x ::(xs++l), 
filter p |] = [], filter p (x:: xs) = if px then z :: filter p zs else filter p xs } 


Fig. 2. An example input to TheSy. 


Running Example. To illustrate TheSy’s theory exploration procedure, we intro- 
duce a simple running example based on a list ADT. The input given to TheSy 
is shown in Fig. 2; it consists of a vocabulary V (of which C is a subset of ADT 
constructors) and a set of known equalities €. The vocabulary V contains the 
canonical list constructors [] and ::, and two basic list operations ++ (concate- 
nate) and filter. The equalities € consist of the definitions of the latter two. 

At a very high level, the following process is about to take place: TheSy 
generates symbolic terms representing length-bound lists, e.g., [], [vi], [v2, vi]. 
Then, it will evaluate all combinations of function applications, up to a small 
depth, using these symbolic terms as arguments. If these evaluations yield com- 
mon values for all possible assignments, the two application terms yielding them 
are conjectured to be equal. Since the evaluated expressions contain symbolic 
values, their result is a symbolic value. Comparing such symbolic values is done 
via congruence closure-based reasoning; we call this process symbolic observa- 
tional equivalence, by way of analogy to observational equivalence [2] that is 
carried out using concrete values. 

Out of the conjectures computed using symbolic observational equivalence, 
TheSy selects minimal ones according to a combined metric of compactness and 
generality. These are passed to a prover that employs both congruence closure 
and induction to verify the correction of the lemmas for all possible list values. 

Some lemmas that TheSy can discover this way are: 


filter p (filter p l) = filter p l lı ++ (lg ++ l3) = (li ++ l2) ++l3 
filter p lı ++ filter p ly = filter p (lı ++ l2) 


As briefly mentioned, our system design relies on congruence closure-based 
reasoning over universally quantified first-order formulas with uninterpreted 
functions. Congruence closure is weak but fast and constitutes one of the core 
procedures in SMT solvers [31,32]. On top of that, universally-quantified assump- 
tions [4] are handled by formulating them as rewrite rules and applying some 
depth-bounded term rewriting as described in Subsect. 3.1. Additionally, TheSy 
implements a simple case splitting mechanism that enables reasoning on condi- 
tional expressions. Notably, this procedure cannot reason about recursive defi- 
nitions since such reasoning routinely requires the use of induction. To that end, 
TheSy is geared towards discovering lemmas that can be proven by induction; 
a lemma is considered useful if it cannot be proven from existing lemmas by 
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congruence closure alone, that is, without induction. Discovering such lemmas 
and adding them to the background knowledge evidently increases the reason- 
ing power of the prover, since at least the fact of their own validity becomes 
provable, which it was not before. 


3 Preliminaries 


This work relies heavily on term rewriting techniques, which is employed across 
multiple phases of the exploration. Term rewriting is implemented efficiently 
using equality graphs (e-graphs). In this section, we present some minimal back- 
ground of both, which will be relevant for the exploration procedure described 
later. 


3.1 Term Rewriting Systems 


Consider a formal language £ of terms over some vocabulary of symbols. We 
use the notation R = tı >t to denote a rewrite rule from tı to tg. For a 
(universally quantified) semantic equality law tı = t2, we would normally create 
both tı > t2 and t2 > tı. We refrain from assigning a direction to equalities since 
we do not wish to restrict the procedure to strongly normalizing systems, as 
is traditionally done in frameworks based on the Knuth-Bendix algorithm [24]. 
Instead, we define equivalence when a sequence of rewrites can identify the terms 
in either direction. A small caveat involves situations where FV (t1) Æ FV(t2), 
that is, one side of the equality contains variables that do not occur on the 
other. We choose to admit only rules t; >t; where FV(t;) D FV(t;), because 
when FV(t;) C FV(t;), applying the rewrite would have to create new symbols 
for the unassigned variables in t;, which results in a large growth in the number 
of symbols and typically makes rewrites much slower as a result. 
This slight asymmetry is what motivates the following definitions. 


Definition 1. Given a rewrite rule R = tı >t, we define a corresponding 


relation “+ such that sı © s2 4> sı = Clti0] A 82 = Clt20] for some context 
C and substitution o for the free variables of t1,t2. (A context is a term with a 
single hole, and C{t] denotes the term obtained by filling the hole with t.) 


Definition 2. Given a relation — we define its symmetric closure: 


t © t —> t © t V t2 D t 


Definition 3. Given a set of rewrite rules Gr = {Ri}, we define a relation as 


union of the relations of the rewrites: 2 U; aS, 


Ri} * 


In the sequel, we will mostly use its reflexive transitive closure, Ae, : 
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filter 


Fig. 3. An e-graph representing the expression filter p (lı ++/2) (dark) and the equiv- 
alent expression filter p lı ++ filter p lə (light). 


; Ri}*., ‘ Sug ; Sip : 
The relation ge, is reflexive, transitive, and symmetric, so it is an equiv- 
alence relation over £L. Under the assumption that all rewrite rules in {R;} are 


semantics preserving, for any equivalence class [t] € £ / a , all terms belong- 
ing to [|t] are definitionally equal. However, since £ may be infinite, it is essentially 


Ri} * ; i 
impossible to compute deu, . Any algorithm can only explore a finite subset 


T C L, and in turn, construct a subset of Ea, ; 


3.2 Compact Representation Using Equality Graphs 


In order to be able to cover a large set of terms 7, we need a compact data 
structure that can efficiently represent many terms. Normally, terms are rep- 
resented by their ASTs (Abstract Syntax Trees), but as there would be many 
instances of common subterms among the terms of 7, this would be highly 
inefficient. Instead, we adopt the concept of equality graphs (e-graphs) from 
automated theorem proving [15], which also saw uses in compiler optimizations 
and program synthesis [30,34,41], in which context they are known as Program 
Expression Graphs (PEGs). An e-graph is essentially a hypergraph where each 
vertex represents a set of equivalent terms (programs), and labeled, directed 
hyperedges represent function applications. Hyperedges therefore have exactly 
one target and zero or more sources, which form an ordered multiset (a vector, 
basically). Just to illustrate, the expression filter p (lı ++l2) will be represented 
by the nodes and edges shown in dark in Fig. 3. The nullary edges represent the 
constant symbols (p, l, l2), and the node uo represents the entire term. The 
expression filter p lı ++ filter p l2, which is equivalent, is represented by the light 
nodes and edges, and the equivalence is captured by sharing of the node uo. 
When used in combination with a rewrite system {R;}, each rewrite rule 
is represented as a premise pattern P and a conclusion pattern C. Applying a 
rewrite rule is then reduced to searching the e-graph for the search pattern and 
obtaining a substitution o for the free variables of P. The result term is then 
obtained by substituting the free variables of C using a. This term is added to 
the same equivalence class as the matched term (i.e. Po), meaning they will 
both have the same root node. Consequently, a single node can represent a set 
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of terms exponentially large in the number of edges, all of which will always be 

{Ri} * 
equivalent modulo ——> . 

In addition, since hyperedges always represent functions, a situation may 
arise in which two vertices represent the same term: This happens if two edges 
a 4 vı and t 4 vo are introduced by {Ri} for vı Æ və. In a purely functional 
setting, this means that vı and v2 are equal. Therefore, when such duplication is 
found, it is beneficial to merge vı and v2, eliminating the duplicate hyperedge. 
The e-graph data structure therefore supports a vertex merge operation and 
a congruence closure-based transformation [44] that finds vertices eligible for 
merge to keep the overall graph size small. This procedure can be quite expensive, 
so it is only run periodically. 


4 Theory Synthesis 


In this section, we go into a more detailed description of the phases of theory 
synthesis and explain how they are combined within an iterative deepening loop. 
To simplify the presentation, we describe all the phases first, then explain how 
the output from the last phase is fed back to the next iteration to complete a 
feedback loop. We continue with the input from the running example in Sect. 2 
(Fig. 2) and dive deeper by showing intermediate states encountered during the 
execution of TheSy on this input. Throughout the execution, TheSy maintains 
a state, consisting of the following elements: 


— Y, a sorted vocabulary 

—~ C CY, a subset of constructors for some or all of the types 

— €, a set of equations initially consisting only of the definitions of the (non- 
constructor) functions in V 

— T, a set of terms, initially containing just atomic terms corresponding to 
symbols from V. 


4.1 Term Generation 


The first step is to generate a set of terms over the vocabulary V. For the 
purpose of generating universally-quantified conjectures, we introduce a set of 
uninterpreted symbols, which we will call placeholders. Let Ty be the set of types 
occurring as the type of some argument of a function symbol in V. For each type 
T occurring in V we generate placeholders 6;, two for each type (we will explain 
later why two are enough). These placeholders, together with all the symbols in 
Y, constitute the terms at depth 0. 

At every iteration of deepening, TheSy uses the set of terms generated so 
far, and the (non-nullary) symbols of V, to form new terms by placing existing 
ones in argument positions. For example, with the definitions from Fig. 2, we 
will have terms such as these at depths 1 and 2: 


T—bool list T list T list T 
1 filter og} oj O1 ++ 02 
T—bool list T list T T—+bool list T 
2 [] ++ filter 0, oj O1 ++ (filter Oy O9 ) (1) 
T— bool list T list T T—bool list T T—bool list T 


filter 0, (©1 ++ 09) (filter 0; ©1) ++ (filter 0, 03) 
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It is easy to see that filter oy" 0; and []++filter 6," 0; are equivalent in 


any context; this follows directly from the definition of ++, available as part of E. 
It is therefore acceptable to discard one of them without affecting completeness. 
TheSy does not discard terms—since they are merged in the e-graph, there is 
no need to—rather, it chooses the smaller term as representative when it needs 
one. This sort of equivalence reduction is present, in some way or another, in 
many automated reasoning and synthesis tools. 

To formalize the procedure of generating and comparing the terms, in an 
attempt to discover new equality conjectures, we introduce the concept of Syn- 
tax Guided Enumeration (SyGuE). SyGuE is similar to Syntax Guided Synthesis 
(SyGuS for short [3]) in that they both use a formal definition of a language to 
find program terms solving a problem. They differ in the problem definition: 
while SyGuS is defined as a search for a correct program over the well-formed 
programs in the language, SyGuE is the sub-problem of iterating over all dis- 
tinct programs in the language. SyGuS solvers may be improved using a smart 
search algorithm, while SyGuE solvers need an efficient way to eliminate dupli- 
cate terms, which may depend on the definition of program equivalence. We 


implement our variant of SyGuE, over the equivalence relation aa, using 
the aforementioned e-graph: by applying and re-applying rewrite rules, provably 
equivalent terms are naturally merged into hyper-vertices, representing equiva- 
lence classes. 


4.2 Conjecture Inference and Screening 


Of course, in order to discover new conjectures, we cannot rely solely on term 
rewriting based on €. To find more equivalent terms, TheSy carries on to gen- 
erate a second set of terms, called symbolic examples, this time using only the 
constructors C C V and uninterpreted symbols for leaves. This set is denoted S7, 
where 7 is an algebraic datatype participating in V (if several such datatypes are 
present, one S7 per type is constructed). The depth of the symbolic examples 
(i.e. depth of applied constructors) is also bounded, but it is independent of the 
current term depth and does not increase during execution. For example, using 
the constructors of list T with an example depth of 2, we obtain the symbolic 
examples S'S? = {[], uv, ::[], v2::01::[]}, corresponding to lists of length up to 2 
having arbitrary element values. Intuitively, if two terms are equivalent for all 
possible assignments of symbolic examples to Sa , then we are going hypothe- 
size that they are equivalent for all list values. This process is very similar to 
observational equivalence as used by program synthesis tools [2,42], but since 
it uses the symbolic value terms instead of concrete values, we dub it symbolic 
observational equivalence (SOE). 

Consider, for example, the simple terms OL. and ò i ++ |]. In placeholder form, 
none of the rewrite rules derived from E applies, so it cannot be determined that 
these terms are, in fact, equivalent. However, with the symbolic list examples 
above, the following rewrites are enabled: 
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Ri} * Ri} * Ri} * 
++ E g pel SES i vipe E vio 
A similar case can be made for the two bottom terms in (1). For symbolic 
values l1, l2 € S''*7, it can be shown that 


T—bool {Ri} ‘i TS T—bool 


filter Oj (li ++ l2) (filter OL ” 11) ++ (filter O1 l2) 


‘ š A é list T š F list T S 
In fact, it is sufficient to substitute for o1, while leaving o2 alone, uninter- 
list a {Ri} * T—bool T—+bool list T 


preted: e.g., filter 61° ([] ++ —> (filter o1 [])++ (filter 0; 02). This 
reduces the number of equivalence checks significantly, and is more than a mere 
heuristic: since we are going to rely on a prover that proceeds by applying induc- 
tion to one of the arguments, it makes perfect sense to only bound that argument. 
If computation is blocked on the second argument, we would prefer to first infer 
an auxiliary lemma first, then use it to discover the blocked lemma later. See 
Example 1 below for an idea of when this situation arises. 


The attentive reader may notice that the cases of v1::|] and v2::v1:i[] are a 
bit more involved: to proceed with the rewrite of filter, the expressions ea V1, 


T— bool T— bool 


0; U, must be resolved to either true or false. However, the predicate o} as 
well as the arguments v,, are uninterpreted. In this case, TheSy is required 
to perform a case split in order to enable the rewrites and unify the symbolic 


+ bool 


terms separately in each of the resulting four (2?) cases. Notice that leaving “6; 
uninterpreted means that the cases are only split when evaluation is blocked by 
one or more rewrite rule applications, potentially saving some branching. The 
following steps are then carried out for each case. 

TheSy applies all the available rewrite rules to the entire e-graph, containing 
all the terms and symbolic examples. For every two terms tı, t2 such that for all 
viable substitutions ø of placeholders to symbolic examples of the corresponding 
types, tıg and t20 were shown equal— that is, ended up in the same equivalence 
class of the e-graph—the conjecture tı = tg is emitted. E.g., in the case of the 
running example: 

filter or (S1 +402) = (filter “oy” 0) ++ (filter or S2) 

In the presence of multiple cases, the results are intersected, so that a con- 

jecture is emitted only if it follows from all the cases. 


Screening. Generating all the pairs according to the above criteria potentially 
creates many “obvious” equalities, which are valid propositions, but do not con- 
tribute to the overall knowledge and just clutter the prover’s state. For example, 


T—bool „list T list T EA T— bool ( list T list T ) 


filter 0, (01, ++ 02) = filter o} (01 ++ ([] ++ 02) 


which follows from the definition of ++ and has nothing to do with filter. The 
synthesizer avoids generating such candidates, by choosing at most one term 
from every equivalence class of placeholder-form terms induced during the term 
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generation phase. If both sides of the equality conjecture belong to the same 
equivalence class, the conjecture is dropped altogether. 

The conjectures that remain are those equalities tı = t2 where tı and ty 
got merged for all the assignments S7 to some 6, and, furthermore, tı and 
t2 themselves were not merged in placeholder form, prior to substitution. Such 
conjectures, if true, are guaranteed to increase the knowledge represented by E as 
(at least) the equality tı = t2 was not previously provable using term rewriting 
and congruence closure. 


4.3 Induction Prover 
For practical reasons, the prover employs the following induction tactic: 


— Structural induction based on the provided constructors (C). 

— The first placeholder of the inductive type is selected as the decreasing argu- 
ment. 

— Exactly one level of induction is attempted for each candidate. 


The reasoning behind this design choice is that for every multi-variable 
term, e.g. oy ++ oe. , the synthesizer also generates the symmetric counterpart 
list T list T° 


02 ++ 01. So electing to perform induction on OL. does not impede generality. 

In addition, if more than one level of induction is needed, the proof can 
(almost) always be revised by factoring out the inner induction as an auxiliary 
lemma. Since the synthesizer produces all candidate equalities, that inner lemma 
will also be discovered and proved with one level of induction. Lemmas so proven 
are added to € and are available to the prover, so that multiple passes over the 
candidates can gradually grow the set of provable equalities. 

When starting a proof, the prover never needs to look at the base case, 
as this case has already been checked during conjecture inference. Recall that 
placeholders ô; are instantiated with bounded-depth expressions using the con- 


structors of 7, and these include all base cases (non-recursive constructors) by 
default. For the example discussed above, the case of filter "o1" ([]++'o2) = 
T— bool T—> bool list T 


(filter o1 []) ++ (filter o1 ©2) has been discharged early on, otherwise the 
conjecture would not have come to pass. The prover then turns to the induction 
step, which is pretty routine but is included in Fig. 4 for completeness of the 
presentation. 

It is worth noting that the conjecture inference, screening and induction 
phases utilize a common reasoning core based on rewriting and congruence clo- 
sure. In situations where the definitions include conditions such as match px 
in Fig. 4 (in this case, desugared from if px), the prover also performs auto- 
matic case split and distributes equalities over the branches. Details and specific 
optimizations are described in Sect. 5. 


136 E. Singher and S. Itzhaky 


Assume filter p (zs ++1,) = filter p xs ++ filter p lı 
Prove filter p ((x:: zs) ++1,) = filter p (x :: as) ++ filter p lı 
via (1) filter p ((a::xs)++l,) = filter p (x ::(xs++1,)) 
(2 = match (px) with true > «:: filter p (as++11) 
false = filter p (ts++l1) 
(IH) (3 = match (px) with true > z :: (filter p zs ++ filter p l1) 
false => filter p xs ++ filter p lı 


(4) filter p (a:: xs) ++ filter p lı 
= (match (px) with true > z :: filter p xs 
false = filter p zs) ++ filter p ly 
(5) = match (px) with true > z :: (filter p xs ++ filter p 11) 


false = filter p zs ++ filter p lı 


Fig. 4. Example proof by induction based on congruence closure and case splitting. 


Speculative Generalization. When the prover receives a conjecture with multiple 
list T list T list T 2 list T list T listT . 
occurrences of a placeholder, e.g. 0, ++(02 ++ 01) = (01 ++ 02)++ 01, it is 


designed to first speculate a more general form for it by replacing the multiple 
occurrences with fresh placeholders. Recall that in Subsect. 4.1 we argued that 
two placeholders of each type is going to be sufficient; this is the mechanism that 
enables it. There is more than one way to generalize a given conjecture: for this 
example, there are two ways (up to alpha-renaming): 


list T list T list T T list T list T list T list T list T list T ? list T list T list T 
Oi ++ (02 ++ 03) = (01 ++ O2 ) ++ 03 Oi ++ (02 ++ 03) = (03 ++ O2 ) ++ 0; 


The prover must attempt both. Failing that, it would fall back to the origi- 
nal conjecture. Formally, given an equality conjecture s = t we can consider an 
assignment o such that r = so,q = to; where the original conjecture uses an 
assignment with only two values per type. The prover thus must iterate through 
different assignments o; with more possible values per type, and attempt to 
prove a new conjecture ra; = qoi. This incurs more work for the prover but is 
well worth its cost compared to a-priori generation of terms with three place- 
holders. 


4.4 Looping Back 


The equations obtained from Subsect. 4.3 are fed back in four different but 
interrelated ways. The first, inner feedback loop is from the induction prover to 
itself: the system will attempt to prove the smaller lemmas first, so that when 
proving the larger ones, these will already be available as part of €. This enables 
more proofs to go through. The second feedback loop uses the lemmas obtained 
to filter out proofs that are no longer needed. The third, outer loop is more 
interesting: as equalities are made into rewrite rules, additional equations may 
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now pass the inference phase, since the symbolic evaluation core can equate more 
terms based on this additional knowledge. The fourth resonates with the third, 
applying the new rewrite rules acts as an equality reduction mechanism, reducing 
the number of hyperedges added to the e-graph during term generation. 

It is worth noting that while concrete observational equivalence uses a triv- 
ially simple equivalence checking mechanism with the trade-off that it may gen- 
erate many incorrect equalities, our symbolic observational equivalence is conser- 
vative in the sense that a symbolic value may represent infinitely many concrete 
inputs, and only if the synthesizer can prove that two terms will evaluate to equal 
values on all of them, by way of constructing a small proof, are they marked as 
equivalent. This means that some actually-equivalent terms may be “blocked” 
by the inference phase, which cannot happen when using concrete values—but 
also means that having additional inference rules (€) can improve this equiva- 
lence checking, potentially leading to more discovered lemmas. This property of 
TheSy is appealing because it allows an explored theory to evolve from basic 
lemmas to more complex ones. 


Example 1 (Lemma seeding). To understand this last point, consider the stan- 
dard definition of list reversal for the list datatype: 


rev [] = [] 
rev (x: xs) = rev zs ++ (x::[]) 
N list T list T list T list T . 
Given the terms tı = rev (0; ++ 02) and t2 = revog ++ revoi, symbolic 
observational equivalence with the assignments {'6; > ST} fails to unify 
them. This is due to ++ being defined by induction on its first argument, hence, 
€.9.— 


rev (va nurs [J++ 2) —* (rev ‘69 ++ (v1 ::[])) ++ (v2 ::[]) 
list T list T 
rev 09 ++ rev vziivii[] —* rev o2 ++ (v1 ::v2:[]) 


Without the associativity property of ++, it would not be possible to show 


o . x ? . 
that these symbolic values are equivalent, so the conjecture t; = t will not even 
n à list T list T list T ? list T list T list T 
be generated. Luckily, having proven o4 ++(02 ++ 03) = (©1 ++ 02)++ 93, 


these rewrites are “unblocked”, so that the equality can be conjectured and 
ultimately proven. 


One caveat is that whenever € is updated by the addition of a new lemma, 
some of the previously emitted conjectures may consequently become redundant. 
Moreover, conjectures that were passed to the prover before but failed validation 
may now succeed, and new ones may be emitted in the generation phase. To take 
these into account, the actual loop performed by TheSy is a bit more involved 
than has been described so far. For each term depth, TheSy performs all phases 
as described, but each time a lemma is discovered TheSy re-runs the conjecture 
generation, screening, and prover phases. Only when no more conjectures are 
available does TheSy increase the term depth and generate new terms. 
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5 Evaluation 


We implemented TheSy in Rust, using the e-graph manipulation library egg [44]. 
TheSy accepts definitions in SMTLIB-2.6 format [6], based on the UF theory 
(uninterpreted functions), limited to universal quantifications. Type declarations 
occurring in the input are collected and comprise V; universal equalities form €E 
and are translated into rewrite rules (either uni- or bidirectional, as explained 
in Subsect. 3.1). Then SyGuE is performed on V, generating candidate conjec- 
tures using SOE. SyGuE uses egg for equivalence reduction, and SOE uses it 
for comparing symbolic values. Conjectures are then dismissed using TheSy’s 
induction-based prover. This is done in an iterative deepening loop. 


Case Split. Both SOE and the prover use a case splitting mechanism; This mech- 
anism detects when rewriting cannot match due to an opaque value (an unin- 
terpreted symbol), and applies case splitting according to the constructors of 
relevant ADTs. However, doing so for every rule is too costly and, in most cases, 
redundant—TheSy generates a variety of terms, so if one term is blocked due to 
an uninterpreted symbol, another one exists with a symbolic example instead. 
A situation where this is not the case is when multiple uninterpreted symbols 
block the rewrite (recall that TheSy only substitutes one placeholder per term 
with symbolic examples). To illustrate, consider the case in Fig. 4 where both 
the list x :: xs and px are used in match expressions, therefore a case split is 
needed by px € {true, false}. Therefore, TheSy only performs case splitting for 
rewrite rules that require multiple match patterns but only one is blocked. 

The splitting mechanism itself, operates by copying the e-graph and applying 
the term rewriting logic separately for each case. Each copy then yields a parti- 
tion of the existing equivalence classes. These partitions are intersected between 
all cases, and each of the resulting intersections lead to merging of equivalence 
classes in the original e-graph. It is worth noting that TheSy never needs to back- 
track a case split it has elected to apply. As a consequence, execution time is 
not exponential in the total number of case splits performed, only in the nesting 
level of such splits (which is bounded by 2 in our experiments). 

We compare TheSy to the most recent and closely related theory explo- 
ration system, Hipster [23]|—which is based on random testing (backed by Quick- 
Spec [38]) with proof automations from and frontend in Isabelle/HOL [33]. Hip- 
ster represents the culmination of several works on existing theory exploration 
(see Sect. 6). Both systems generate a set of proved lemmas as output, each 
such set encompassing a conceptual volume of knowledge that was discovered 
automatically. We note that the same knowledge can be represented in various 
ways, so directly comparing the sets of lemmas is going to be meaningless. 


5.1 Evaluating Theory Exploration Quality 


We define a comparison method for two theory exploration systems A and B 
starting from a common initial theory (defined as a set of closed formulas) T. 
As a metric for the quality and efficacy of results obtained from theory explo- 
ration, and, therefore, their perceived usefulness, we use the notion of knowledge 
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Fig. 5. A scatter plot showing the ratio of lemmas in theories discovered by each tool 
that were subsumed by the theory discovered by its counterpart (T = TheSy, H = 
Hipster). Each point represents a single test case. The vertical axis shows how many of 
the lemmas discovered by Hipster were subsumed by those discovered by TheSy, and 
the horizontal axis shows the converse. 


(inspired by “knowledge base” in Theorema [8]). A theory T in a given logical 
proof system induces a collection of attainable knowledge, Kr = {p |T F p}, 
that is, characterized by the set of (true) statements that can be proven based 
on T. In practice, a “pure” notion of knowledge based on provability is imprac- 
tical, because most interesting logics are undecidable, and automated proving 
techniques cannot feasibly find proofs for all true statements. We, therefore, 
parameterize knowledge relative to a prover—a procedure that always termi- 
nates and can prove a subset of true statements. Termination can be achieved 
by restricting the space of proofs by either size or resource bounds. We say that 
TF p when a prover, S, is able to verify the validity of y in a theory T. A 
more realistic characterization of knowledge would then be KÊ = Ty | TË p}. 
Assuming that the prover S is fixed, a theory T’ is said to increase knowledge 
over T when KŻ, > KF. 

We utilize the notion of K described above to test the knowledge gained by 
A against that of B, and vice versa. We take the set of lemmas T4 generated by 
A and check whether it is subsumed by Tg, generated by B, by checking whether 
TAG Ke ute) we then carry out the same comparison with the roles of A and B 
reversed. A working assumption is that both A and B include some mechanism 
for screening redundant conjectures. That is, a component that receives the 
current set of known lemmas T; and a conjecture y and decides whether the 
conjecture is redundant. It is important to choose S such that whenever A (or 
B) discards y, due to redundancy, it holds that y € KR. 

Incorporating the solver into the comparison makes the evaluation resistant 
to large amounts of trivial lemmas, as they will be discarded by A or B. It is 
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still possible for some lemmas to be “better” than others, so knowledge is not 
uniformly distributed; this is hard to quantify, though. A few possible mea- 
sures of usefulness come to mind, such as lemma utilization in a task (such as 
proof search), proof complexity, or matching to a given context, but given just 
the exploration task, there is not sufficient information to apply them. A first 
approximation is to consider the discovered lemmas themselves, i.e., TA U Tp, 
as representing proof objectives. In doing so, we pit A and B in direct contest 
with one another. We choose this avenue because it is straightforward to apply, 
admitting that it may be inaccurate in some cases. 

To evaluate our approach and its implementation, we run both TheSy 
and Hipster on functional definitions collected from the TIP 2015 benchmark 
suite [11], specifically the IsaPlanner [21] benchmarks (85 benchmarks in total), 
for compatibility between the two systems. TIP benchmarks also contain goal 
propositions, but for the purpose of evaluating the exploration technique, these 
are redacted. This experiment uses the simple rewrite-driven congruence-closure 
decision procedure with a case split mechanism in the role of the solver, S, 
occurring in the definition of knowledge K. Hipster uses Isabelle/HOL’s simpli- 
fier as a conjecture redundancy filtering mechanism, which is in itself a simple 
rewrite-driven decision procedure, therefore S provides a suitable comparison. 
We compute the portion of lemmas found by Hipster that were provable (by S) 
from TheSy’s results and vice versa. In other words, we check the ratio given by 
[TaN K3y7,|/|Tal, which we denote Tg %Ta, in both directions. Figure 5 dis- 
plays the ratios, where each point represents a single test case. Points above the 
diagonal line represent test cases where TheSy’s ratio was higher and for points 
under the line Hipster’s ratio was higher. We conduct this experiment twice: 
Once with the case-splitting mechanism of TheSy turned off for its exploration, 
and once with it turned on. (Hipster does not have such a switch as it always 
generates concrete values.) The reason for this is that case splitting increases 
the running time significantly (as we show next), so we want to evaluate its con- 
tribution to the discovery of lemmas. Comparing the two charts, while TheSy 
performs reasonably well compared to Hipster without case splitting (in 48 out 
of the 85 TheSy’s ratio was better and equal in 12), enabling it leads to a clear 
advantage (in 65 out of the 85 TheSy’s ratio was better and equal in 6). 


Performance. To compare runtime efficiency, we consider the time it took to 
fully explore the IsaPlanner test suite. We consider an exploration “full” when 
it has finished enumerating all the terms, and associated candidate conjectures, 
up to the depth bound (k = 2)? with TheSy or size bound with Hipster (s = 7), 
and check them; or when a timeout of one hour is reached, whichever is sooner. 
We then sort the benchmarks from shortest- to longest-running for each of the 
tools, and report the accumulated time to explore the first i benchmarks (i = 
1..85). The results are shown in the graph in Fig. 6, for Hipster, TheSy with 
case split disabled, and TheSy with case split enabled. In both configurations, 


? Our experience shows that choosing larger ks greatly affects the run-time, but does 
not lead to many useful lemmas. 
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Fig.6. Time to fully explore the 85 IsaPlanner benchmarks. A full exploration is 
considered one where either all terms up to the depth bound have been enumerated 
or a timeout of 1h has been reached. The y axis shows the amount of time needed 
to complete the first x benchmarks, when they are sorted from shortest- to longest- 
running. (Time scale is logarithmic; lower is better.) 


TheSy is very fast for the lower percentiles, but begins to slow down, due to case 
splitting, towards the end of the line. To illustrate, in the 25th percentile TheSy 
was ~380 times faster (0.48 s vs. 182.47 s); in the 50th percentile, ~57 times faster 
(5.28s vs. 305.37 s); and in the 75th percentile, ~6 times faster (141.24 to 883.8). 
Overall TheSy took 51.6K seconds and Hipster 47.1K, meaning Hipster was ~1.1 
times faster. It is evident from the chart that case splitting is largely responsible 
for the longer execution times. Without case splitting, TheSy is much faster, and 
completes all 85 benchmarks in less time than it takes Hipster. Of course, in that 
mode of operation, TheSy finds fewer lemmas (as shown in Fig. 5), but is still 
superior to Hipster. Future work needs to focus on improving the case-splitting 
mechanism, similar to their treatment in SAT and SMT, allowing TheSy to deal 
with such theories more efficiently. 


5.2 Efficacy to Automated Proving 


While the mission statement of TheSy is solely to provide lemmas based on core 
theories, we wish to claim that such discovered theories are beneficial toward 
proving theorems in general, based on the same core theory. We used a collec- 
tion of benchmarks for induction proofs used by CVC4 [37], and conducted the 
following experiment: First, the proof goals are skipped and only the symbol dec- 
larations and provided axioms are used to construct an input to TheSy. Then, 
whenever a new lemma is discovered and passes through the prover, we also 
attempt to prove the goal—utilizing the same mechanism used for vetting con- 
jectures. As soon as the latter goes through, the exploration process is aborted, 
and all lemmas collected are discarded. The experiments are thus independent 
across the individual benchmarks. 
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Table 1. Results of the CVC4 benchmark suite (number of successful proofs in each 
category). 


Total | Z3 | CVC4 | CVC4-+ig | TheSy 
clam 136 |25 |20 108 102 
hipspec 42 6| 7 33 29 
isaplanner| 87 | 35 | 34 79 47 
leon 46 9 9 40 9 
Total 311 | 75 70 260 187 
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Fig. 7. Accumulated time-to-solve for each of the benchmark suites from the CVC4 col- 
lection. The y axis shows the amount of time needed to complete the first x (successful) 
proofs, when benchmarks are sorted from shortest- to longest-running. 


Even though this setting is unfavorable to TheSy—because it does not take 
advantage of the fact that theory exploration can be done offline, then its results 
re-used for proofs over the same core theory—we report considerable success in 
solving these benchmarks. Out of the 311 benchmarks, our theory exploration 
+ simple-minded induction was able to prove 187 (with a 5-min timeout, same 
as in the original CVC4 experiments). For comparison, Z3 and CVC4 (with- 
out conjecture generation) were able to prove 75 and 70 of them, respectively. 
This shows that the majority of instances were not solvable without the use of 
induction. CVC4 with its conjecture generation enabled was able to solve 260 
of them. Table 1 shows the number of successful proofs achieved for each of the 
four suites. Figure 7 shows the accumulated time required for the benchmarks; 
the vast majority of the success cases occur early on, because in some cases a 
rather small auxiliary lemma is all that is needed to make the proof go through. 
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6 Related Work 


Equality Graphs. Originally brought into use for automated theorem proving [15], 
e-graphs were popularized as a mechanism for implementing low-level compiler 
optimizations [41], under the name PEGs. These e-graphs can be used to repre- 
sent a large program space compactly by packing together equivalent programs. 
In that sense they are similar to Version Space Algebras [26], but their prime 
objective is entirely different. While VSAs focus on efficient intersections, e- 
graphs are used to saturate a space of expressions with all equality relations 
that can be inferred. They have found use in optimizing expressions for more 
than just speed, for example to increase numerical stability of floating-point pro- 
grams in Herbie [34]. There are two key differences in the way e-graphs are used 
in this work compared to prior: (i) equality laws are not hard-coded nor fixed, 
they are fertilized as the system proves more lemmas automatically; (ii) satu- 
ration cannot be guaranteed or even obtained in all cases, which we overcome 
by a bound on rewrite-rule application depth. (The latter point is an indirect 
consequence of the former.) 


Automated Theorem Provers. Many systems rely on known theorems or are 
designed to support users in semi-automated proving. Congruence closure is also 
a proven method for tautology checking in automated theorem provers, such as 
Vampire [25], and is used as a decision procedure for reasoning about equality 
in leading SMT solvers Z3 [14] and CVC4 [5]. There, it is limited mostly to 
first-order reasoning, but can essentially be applied unchanged to higher-level 
scenarios such as ours. 

Related to theory exploration, but using separate techniques, are Zipperpo- 
sition [13], and the conjecture generation mechanism implemented as part of 
the induction prover in CVC4 [87]. It should be noted, that these are directed 
toward a specific proof goal, as opposed to theory exploration, which is presumed 
to be an offline phase. As such, the above two techniques incorporate genera- 
tion of inductive hypotheses into the saturation proof search/SMT procedure, 
respectively. 


Theory Exploration. IsaCoSy [22] pioneered the use of synthesis techniques for 
bottom-up lemma discovery. IsaCoSy combines equivalence reduction with coun- 
terexample-guided inductive synthesis (CEGIS [40]) for filtering candidate lem- 
mas. This requires a solver capable of generating counterexamples to equiva- 
lence. Subsequent development was based on random generation of test values, 
as implemented in QuickSpec [38] for reasoning about Haskell programs, later 
combined with automated provers for checking the generated conjectures [10, 20]. 
We have mentioned the deficiencies of using concrete values (as opposed to sym- 
bolic ones) and random testing in Sect. 1 and make an empirical comparison 
with Hipster, a descendent of IsaCoSy and QuickSpec, in Sect. 5. 


Inductive Synthesis. In the area of SyGuS [3], tractable bottom-up enumeration 
is commonly achieved by some form of equivalence reduction [39]. When dealing 
with concrete input-output examples, observational equivalence [2,42] is very 
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effective. The use of symbolic examples in synthesis has been suggested [17], but 
to the best of our knowledge, ours is the only setting where symbolic observa- 
tional equivalence has been applied. Inductive synthesis, in combination with 
abduction [16], has also been used to infer specifications [1], although not as an 
exploration method but as a supporting mechanism for verification. 


7 Conclusion 


We described a new method for theory exploration, which differentiates itself 
from existing work by basing the reasoning on a novel engine based on term 
rewriting. The new approach differs from previous work, specifically those based 
on testing techniques, in that: 


1. This lightweight reasoning is purely symbolic, supporting value abstraction 
and performs better then prior art. 

2. Functions are naturally treated as first-class objects, without specific support 
implementation. 

3. The only needed input is the code defining the functions involved, and no 
support code such as a specific theory solver or random value generators. 

4. TheSy has a unique feedback loop between the prover and the synthesizer, 
allowing more conjectures to be found and proofs to succeed. 


By creating a feedback loop between the four different phases, term genera- 
tion, conjecture inference, conjecture screening and induction prover, this system 
manages to efficiently explore many theories. This goes beyond similar feedback 
loops in existing tools, aiming to reduce false and duplicate conjectures. As 
explained in Subsect. 4.2, this form is also present in TheSy, but TheSy utilizes 
this feedback in more phases of the computation. 

Theory exploration carries practical significance to many automated reason- 
ing tasks, especially in formal methods, verification and optimization. Complex 
properties lead to an ever-growing number of definitions and associated lemmas, 
which constitute an integral part of proof construction. These lemmas can be 
used for SMT solving, automated and interactive theorem proving, and as a 
basis for equivalence reduction in enumerative synthesis. The term rewriting- 
based method that we presented in this paper is simple, highly flexible, and has 
already shown results surpassing existing exploration methods. The generated 
lemmas allow even this simple method to prove conjectures that normally require 
sophisticated SMT extensions. Our main conclusion is that deductive techniques 
and symbolic evaluation can greatly contribute to theory exploration, in addition 
to their existing applications in invariant and auxiliary conjecture inference. 
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Abstract. We present a certified SMT QF_BV solver CoQQFBV built 
from a verified bit blasting algorithm, KISSAT, and the verified SAT cer- 
tificate checker GRATCHK in this paper. Our verified bit blasting algo- 
rithm supports the full QF_BV logic of SMT-LIB; it is specified and for- 
mally verified in the proof assistant Coq. We compare COQQFBV with 
CVC4, BITWUZLA, and BOOLECTOR on benchmarks from the QF_BV 
division of the single query track in the 2020 SMT Competition, and real- 
world cryptographic program verification problems. COQQFBV surpris- 
ingly solves more program verification problems with certification than 
the 2020 SMT QF BV division winner BITWUZLA without certification. 


1 Introduction 


Satisfiability Modulo Theories (SMT) solvers for the Quantifier-Free Bit-Vector 
(QF. BV) logic have been used to verify programs with bit-level accuracy [9, 
10]. In such applications, a program verification problem is reformulated as an 
SMT QF_BV query. An SMT QF BV solver is then invoked to compute a query 
result. The query result in turn decides the answer to the program verification 
problem. For cryptographic assembly programs, a missing carry or borrow flag 
will result in incorrect computation. Bit-accurate verification is thus necessary 
for cryptographic programs. SMT QF BV solvers in fact have been employed to 
verify such programs [8,25]. These solvers nonetheless are very complex programs 
with possibly unknown bugs [7,18]. Since bugs in SMT QE. BV solvers may 
induce incorrect query results, program verification cannot be taken without a 
grain of salt when SMT QF-BV solvers are employed. 

In order to check SMT QF_BV query results independently, SMT QF_BV 
solvers can generate certificates to validate their answers. In the LFSC certifi- 
cates [14,23], for instance, an SMT QF_BV query result is certified by correct bit 
blasting and Boolean Satisfiability (SAT) solving. Such certificates demonstrate 
that the SMT QF_BV query is reduced to a Boolean SAT query correctly and 
the corresponding SAT query is solved correctly. Although one can certify SAT 
query results with certificates from SAT solvers [24], it is not always easy to cer- 
tify correct bit blasting due to complex arithmetic operations in SMT QF BV 
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queries. Developing correct and efficient checkers for SMT QF BV certificates 
can be very challenging. Indeed, an LFSC certificate checker based on the proof 
assistant COQ has been developed to improve confidence [12]. Yet the CoQ-based 
certificate checker does not fully support arithmetic operations and thus cannot 
certify results of SMT QF BV queries with complicated arithmetic operations. 
Consequently, the correctness of cryptographic programs still relies on the cor- 
rectness of SMT QFE BV solvers or their unverified certificate checkers. 

In this paper, we take a more direct approach to ensure the correctness of 
SMT QF BV query results. Instead of certifying correct bit blasting for every 
SMT QE_BV query, we specify a bit blasting algorithm and prove its correct- 
ness in the proof assistant Coq. In order to formalize the correctness of our bit 
blasting algorithm, we develop a formal bit-vector theory in Coq. Naturally, 
the formal theory has to support all arithmetic functions (addition, subtraction, 
multiplication, division, and remainder) for both signed and unsigned represen- 
tations as needed in SMT-LIB [3]. Based on our new bit-vector theory, we give 
a formal semantics for SMT QF BV queries in Coq. Our semantics follows the 
SMT-LIB semantics carefully. Particularly, division and remainder are total 
arithmetic operations even when the divisor is zero. Using our Coq bit-vector 
theory and semantics, we prove that our bit blasting algorithm always returns a 
corresponding Boolean formula correctly on any SMT QF_BV query. Since our 
algorithm has been formally verified, bit blasting is always correct and need not 
be certified. Through the OCAML program extracted from our verified bit blast- 
ing algorithm, a corresponding SAT query is obtained for each SMTQF_BV 
query and sent to a SAT solver. A SAT certificate checker suffices to validate 
SAT query results and hence the correctness of answers to SMT QF_BV queries. 
Since neither complicated SMT QF-BV solvers nor their certificate checkers are 
trusted, our work can improve the confidence of SMT QF BV query results. 

To our knowledge, our bit-vector theory is the first Coq formalization 
designed for bit blasting queries from the QF_BV logic of SMT-LIB. Our seman- 
tics is the first COQ formalization for full SMT QF_BV queries. We are not aware 
of any verified bit blasting algorithm or program for full SMT QF_BV queries 
of SMT-LIB at the time of writing. Even the correctness of its results could be 
ensured, our certified SMT QF BV solver COQQFBV would not be very useful 
if it were extremely inefficient. In order to evaluate its performance, we run CoQ- 
QFBV on benchmarks from the QF_BV division of the single query track in the 
2020 SMT Competition. With the same memory and time limits in the competi- 
tion, our solver successfully finishes 88.72% of the 6861 queries with certification. 
In comparison, CVC4 with its certificate checker solves 55.97% with certifi- 
cation, and the division winner BITWUZLA solves 98.22% of the benchmarks 
without certification. Our certified solver outperforms CVC4 with certification 
significantly. Generating and checking certificates make our certified solver finish 
about 10% of the queries less than the division winner. The price of accuracy 
perhaps is not unacceptable for the benchmarks in the competition. To fur- 
ther evaluate COQQFBY\V, the certified solver is used to verify linear arithmetic 
assembly programs from various cryptography libraries such as OpenSSL [30]. 
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CoQQFBV gives certified answers to 96.88% out of the 96 SMT QF BV queries 
from real-world cryptographic program verification. CVC4 with its certificate 
checker certifies 19.79%. Compared with efficient SMT QF_BV solvers without 
certification, BOOLECTOR is able to solve 100% and BITWUZLA solves 91.67% of 
the queries. Intriguingly, our certified SMT QF BV solver outperforms the 2020 
division winner BITWUZLA in queries from real-world verification problems. Our 
certified solver is probably useful for real-world verification problems. 


Related Work. As mentioned, SMT certificate generating and checking are chal- 
lenging. There are few efforts developing SMT QF_BV certificate checkers, let 
alone verified ones. CVC4 is able to produce unsatisfiability certificates for 
QF_BV queries, and also equipped with an (unverified) certificate checker [14]. 
SMTCogQ [12] is proposed to check certificates from SMT solvers VERIT and 
CVC4. It supports fragments of several logics including the QF_BV logic. More- 
over, its correctness is formally proved in CoQ. However, the QF_BV logic is 
not fully supported by SMTCoa. Z3 also supports certificate generation for the 
QF BV logic [19]. The proofs can be reconstructed, thus checked, within proof 
assistants HOL4 and ISABELLE [6]. But the lack of details in Z3’s generated 
certificates makes proof reconstruction particularly challenging. 

With a similar approach in this paper, GL is a framework for bit blasting 
finitely bounded ACL2 theorems into SAT queries [28]. Its bit blasting algorithm 
is formally verified in ACL2. Though it is not designed for SMT-LIB, most of 
the operations defined in the QF_BV logic are supported, except division and 
concatenation for instance. A bit blasting algorithm is defined and verified in 
HOL4 as well [13]. Neither [28] nor [13] aims to develop a scalable SMT QF_BV 
solver. COQQFBV accepts SMT-LIB inputs with fully supported QF_BV logic 
while adopting performance optimizations such as caches. 

In ISABELLE and HOL4, one can use the bit-vector libraries to conform 
SMT-LIB operations, see [17] for example. Under the frame of Coa, coq-bits 
is a formalization of logical and arithmetic operations on bit-vectors [15]. The 
library provides the mapping between bit-vector operations and abstract number 
operations. Different from our theory, it does not support division/remainder or 
signed operations. WHY3 [11] provides a bit-vector theory which is formalized 
in Cog too. It defines the division by zero in a different way from SMT-LIB. 
Moreover, the operations are defined based on integer operations. Our new bit- 
vector theory instead defines bit-vector operations through bit manipulation. It 
is more suitable for the correctness proof of bit blasting algorithms. 

We have the following organization. After the introduction, an overview is 
given in Sect. 2. Section3 reviews preliminaries. Our formal bit-vector theory 
is presented in Sect. 4. It is followed by the formal semantics of SMT QF_BV 
queries (Sect.5). The correctness of our bit blasting algorithm is established in 
Sect. 6. Section 7 outlines the construction of our certified SMTQF_BV solver. 
Experiments are presented in Sect.8. Section 9 concludes our presentation. 


152 X. Shi et al. 


2 Methodology Overview 


Given an SMT QF BV query, a bit blasting algorithm computes a Boolean 
formula such that the SMT QEF BV query is satisfiable if and only if the 
Boolean formula is satisfiable. The QF_BV logic contains arithmetic operations 
for bit-vectors. Computing an equi-satisfiable Boolean formula for an arbitrary 
SMT QF-BV query can be very complicated and susceptible to errors. Our goal 
is to construct a correct bit blasting program for every SMT QF_BV query. The 
correctness of the program moreover is verified by the proof assistant COQ to 
minimize gaps or even errors in hand-written proofs. 

Our construction is based on a new formal bit-vector theory coq-nbits 
(Sect. 4). In coq-nbits, we define bit-vectors and their functions on top of the 
Coq data type for Boolean sequences. In order to support the QF_BV logic 
of SMT-LIB fully, five arithmetic bit-vector functions (addition, subtraction, 
multiplication, division, and remainder) are defined in our formal theory. To 
establish the correctness of our definitions, formal proofs are provided to relate 
bit-vector functions with their arithmetic counterparts. For instance, we show the 
number represented by the output of the bit-vector negation function is indeed 
the arithmetic negation of the number represented by the input bit-vector. 

Using our coq-nbits theory, we then give a formal semantics for 
SMTQF' BV queries as defined in SMT-LIB (Sect.5). In our formalization, 
a QF BV predicate denotes a Boolean value; and a QF BV expression denotes 
a bit-vector. An SMT QF_BV query is formalized as a Boolean combination of 
QF_BV predicates on QF_BV expressions over QF_BV variables and bit-vector 
constants. In order to demonstrate the correctness of our formal semantics for 
SMT QF BV queries, formal proofs are provided to show that our formal seman- 
tics coincides with those defined in SMT-LIB. 

Our bit blasting algorithm is given in Coq (Sect.6). It extends Tseitin 
transformation for Boolean formulae to SMT QF_BV queries. More precisely, a 
QE BV predicate is transformed to a literal with a Boolean formula; a QF_BV 
expression is transformed to a literal sequence with a Boolean formula. Using 
our formalization of SMT QF BV queries, the correctness of bit blasting algo- 
rithm is established in COQ by mutual induction. To improve efficiency, our bit 
blasting algorithm is further optimized with more economic transformations and 
a cache. The optimized bit blasting algorithm is also verified with formal Coq 
proofs. 

Our formally verified bit blasting algorithm is written in the COQ specifica- 
tion language. It is not yet a program compilable into executable binary codes. 
Using the code extraction mechanism in COQ, an OCAML program is extracted 
from our verified bit blasting algorithm. The OCAML program takes expressions 
in our formal SMT QF_BV query syntax as inputs and returns expressions in 
our formal syntax for Boolean formulae as outputs. SAT solvers can be employed 
to decide satisfiability of output Boolean formulae. Their certificates can be val- 
idated by SAT certificate checkers independently (Sect. 7). 
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3 Preliminaries 


Let v be a Boolean variable with values ff and tt. A literal is of the form v or ~w. 
A clause is a disjunction lo VI, V--- Vly, of literals lo, l1, . .. , lk. A Boolean formula 
in the conjunctive normal form (CNF) is a conjunction cp ci A: ++ACm of clauses 
C0, C1;---;Cm. A SAT query is a Boolean CNF formula. An environment maps 
Boolean variables to their values. Given a SAT query, the Boolean satisfiability 
problem is to decide if the query evaluates to tt on some environments. 

A bit-vector of width w is written as #bby—1by—2--- bo with b; € {0,1} for 0 < 
i < w. In the unsigned representation, the bit-vector #bby—1bw—2--- bo denotes 
the natural number (non-negative integer) X o<i<w bi2'; in two’s complement 
(signed) representation, it denotes the integer J o<icw—1 6:2? — 2”—1b,,_1. For 
instance, #b1010 denotes 10 and —6 in the unsigned and two’s complement 
representations respectively. We use bv2nat(bv) for the natural number denoted 
by the bit-vector bv in the unsigned representation; and nat2bu(w, 7) stands for 
the bit-vector of width w representing the natural number i modulo 2”. 

Let bv = #bby—1bw_2---bo and cu = #bcy,_1Cy—2°+-Co be bit-vectors of 
widths w and u respectively. The following QF BV operations are defined in the 
QF BV logic of SMT-LIB: concat bv cv £ #bbu—1bw—2:--boCu—1Cu—2* -Co 
is the concatenation of bv and cv; extract i j bu £ #bb;b;_1--- b; extracts 
bits from bv where 0 < j < i < w; bunot bv, bvand bv cv, and bvor bu cv 
are the bitwise complement, and, or operations respectively. Additionally, 
buneg bu = nat2bu(w,2” — bu2nat(bv)) is the arithmetic negation operation; 
bvadd bv cu £ nat2bu(w, bv2nat(bv) + bv2nat(cv)) is the arithmetic addition 
operation; and bumul bv cv £ nat2bu(w, bu2nat(bv) x bu2nat(cv)) is the arith- 
metic multiplication operation. The arithmetic division and remainder opera- 
tions are 


ee ee { nat2bu(w, 2” — 1) if bu2nat(cv) = 0 
~ | nat2bu(w, bu2nat(bv) + bu2nat(cv)) otherwise 
by if bu2nat(cv) = 0 


A 
bvurem bu cu = { nat2bu(w, bv2nat(bv) mod bu2nat(cv)) otherwise. 


Note that the arithmetic division and remainder operations are defined 
even when the divisor represents the number zero. Finally, the operations 
bushl bu cv & nat2bu(w, bu2nat(bv) x 2°%?net(cr)) shifts the bit-vector bu to 
the left by bv2nat(cv) bits; bulshr bu cu £ nat2bu(w, bu2nat(bv) + 2>e2r2t(er)) 
shifts the bit-vector bv to the right by bu2nat(cv) bits. In addition to bit-vector 
operations, the QF BV logic of SMT-LIB defines QF BV predicates on bit- 
vectors. The predicate bveq bv cv is true when the bit-vectors bv and cv are 
equal; buult bv cv is true if bu2nat(bv) < bu2nat(cv). In the QF_BV logic of 
SMT-LIB, both operands of binary operations and predicates must have the 
same width. Overall, seventeen bit-vector operations and predicates are defined 
in the QF BV logic of SMT-LIB. Particularly, arithmetic division and remain- 
der operations with operands in both unsigned and two’s complement signed 
representations are defined in SMT-LIB. 
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A QF BV variable denotes a bit-vector. A QF BV expression is constructed 
from QF_BV operations over QF_BV variables and bit-vectors. An SMT QF_BV 
query is a Boolean combination of QF_BV predicates on QF_BV expressions. Let 
stores be mappings from QF_BV variables to bit-vectors. Given an SMT QF_BV 
query, the satisfiability modulo QF_BV theory problem is to decide if the query 
evaluates to tt on some stores. 


4 Bit-Vector Theory 


We present our formal Coq bit-vector theory coq-nbits in this section. The 
coq-nbits theory supports bit-vectors in both unsigned and two’s complement 
signed representations. In coq-nbits, a bit-vector is represented by a Boolean 
sequence of the data type bits in the least significant bit-first order. 


| Definition bits : Set := seq bool. 


In the definition, bool and seq are the data types for Boolean values (false and 
true) and sequences in COQ respectively. For instance, the bit-vector #b100 is 
represented by [:: false; false; true] in coq-nbits. 

Coq functions defined for sequences are applicable to bit-vectors. Particu- 
larly, size bv computes the width of the bit-vector bv and bv ++ cv is the concate- 
nation of the bit-vectors bv and cv. It is also straightforward to define auxiliary 
bit-vector functions. For example, zeros n returns the bit-vector of n false’s; 
ones n returns the bit-vector of n true’s; extract i j bv returns the sub-sequence 
of the bit-vector bv with indices from j to i where 0 < j < i < size bv. Let 
a £ [:: false; false; true]. Then size a = 3 and extract 2 1 a = [:: false; true]. 

Bitwise functions are defined as easily. For instance, the bitwise inverse func- 
tion maps each Boolean value to its complement: 


| Definition invB bv : bits := map (fun b => ~~b) bv. 


Other bitwise functions are defined similarly. Specifically, bitwise and andB, 
bitwise or orB, logical left shift shIB, logical right shift shrB are all defined 
in cog-nbits. Let b = [:: false;true;true]. We have invB b = [:: true; 
false; false], andB ab = [:: false; false; true], and shIB 1 b = [:: false; 
false; true]. 

Arithmetic bit-vector functions are slightly more complicated. To prove 
properties about arithmetic functions, coq-nbits provides conversion functions 
between bit-vectors and natural numbers. 


Definition to N (bv : bits) : N := 
foldr (fun b res => N_of_bool b + res * 2) O bv. 


In the definition, to_N bv converts the bit-vector bv to a natural number where 
N_of_bool false = 0 and N_of_bool true = 1. The to_N function multiplies the 
previous result by two and adds the least significant bit b. For instance, to_N a 
= to_N [:: false; false; true] = 4. The function from_N w n, on the other hand, 
converts any natural number n to a bit-vector of width w. 
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| Fixpoint from. N (w : nat) (n : N) : bits := 
match w with 
| 0 => [::] 

S w => (N.odd n)::(from.N w (N.div n 2)) 


a 


| 
en 


The function first checks the width w. If the width is zero, it returns the empty 
bit-vector. Otherwise, the function returns the bit-vector with the least signif- 
icant bit N.odd n and the remaining w — 1 bits representing n divided by two. 
Observe that two Coq formalizations of natural numbers are used. The nat 
theory uses the unary representation suitable for inductive proofs; N uses the 
succinct binary representation. The following lemma is proved in COQ: 


Lemma 1. The following properties hold: 


1. Vbv, from_N (size bv) (to_N bv) = bv. 
2. Yw n,n < 2” => to_N (from N wn) =n. 


The first property shows that bit-vectors can be converted to natural numbers 
and back to themselves. The second property shows that natural numbers can 
be converted to bit-vectors with sufficient widths and back to themselves. To see 
how they are used to prove properties about bit-vector functions in coq-nbits, 
consider the definition of the successor bit-vector function. 


| Fixpoint succB (bv : bits) : bits := 
T bu with 
a => [::] 
::tl => if hd then false::(succB tl) else true::tl 


If the input is the empty bit-vector, the function returns the empty bit-vector. 
Otherwise, succB checks the least significant bit of the input bit-vector. If the bit 
is true, the function computes the successor of the remaining bits and appends 
false as the least significant bit. If the least significant bit of the input is 
false, the function simply changes the least significant bit to true and copies 
the remaining bits. Using the conversion functions, the bit-vector successor is 
related to the arithmetic successor in the following lemma: 


Lemma 2. Vbv,succB bv = from_N (size bv) ((to_N bv) + 1). 


Lemma 2 says that succB bv does compute the bit-vector representing the arith- 
metic successor of the natural number represented by the bit-vector bv. Observe 
that the successor bit-vector function is correct when the input bit-vector is 
empty. It is also correct when there is overflow. Indeed, both sides are zeros of 
width size bv when overflow occurs. 

Other arithmetic bit-vector functions are defined and proved in coq-nbits 
similarly. Specifically, the arithmetic negation negB, addition addB, subtrac- 
tion subB, unsigned multiplication mulB, unsigned division divB, and unsigned 
remainder remB functions are supported by coq-nbits. We give properties to 
relate the arithmetic functions for bit-vectors and natural numbers. 
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Lemma 3. The following properties hold: 


1. Vbv cv,size bv = size cu => to_N (addB bv cv) = (to_N bv + to_N cv) mod 
gsize bv 

2. Ybu cv,to-N (mulB bv cv) = (to_N bv x to_N cv) mod 2876 >”, 

3. Vbv n,divB bv (zeros n) = ones (size bv). 

4. Vbv bu,size bv = size cv = > cv # zeros (size cv) => to_N (divB bv cv) = 

(to_N bv) div (to_N cv). 

Vbv n,remB bv (zeros n) = bv. 

6. Vbv cv,size bv = size cu = > cv ¥ zeros (size cv) => to_N (remB bv cv) = 
(to_N bv) mod (to_N cv). 

7. Ybu n,to_N (shIB n bv) = ((to_N bv) x 2”) mod 257° bv 

8. Vbv n,to N (shrB n bv) = (to_N bv) div 2”. 


Si 


Let bv, cv be bit-vectors of width w. Lemma 3 shows that the natural number 
represented by the bit-vector addB bv cv is equal to the modular sum of the natu- 
ral numbers represented by bv and cv. Similarly, the natural number represented 
by mulB bv cv is equal to the modular product of the natural numbers repre- 
sented by bv and cv. The division and remainder functions in coq-nbits follow 
the SMT-LIB semantics. Specifically, the quotient of any bit-vector divided by 
zero is equal to the bit-vector of all true’s; the remainder of a bit-vector divided 
by zero is the bit-vector itself. For non-zero divisors, the division and remainder 
functions behave as expected. The natural number represented by the bit-vector 
divB bv cv is the quotient of the number represented by bv divided by the number 
represented by cv; and the bit-vector remB bv cv represents the remainder of the 
number represented by bv divided by the number represented by cv. Last but not 
least, the logical left (shIB) and right (shrB) shifts correspond to multiplication 
and division by powers of two respectively. 

cog-nbits also provides comparison predicates. In addition to the equality 
predicate == inherited from Boolean sequences, ItB bv cv and leB bv cv compare 
the natural numbers represented by the bit-vectors bv and cv. Properties about 
comparison predicates have also been proved in Coq. 


Lemma 4. The following properties hold: 


1. Vbv cv,size bv = size cv => ItB bv cv = (to_N bu < to N cv). 
2. Vbv cv, size bv = size cu => leB bv cv = (to_N bu < to_N cv). 


In addition to arithmetic functions and predicates in the unsigned represen- 
tation, our formal bit-vector theory moreover defines arithmetic functions and 
predicates for bit-vectors in two’s complement representation. For the signed 
representation, bit-vectors are converted to integers by the to_Z function. Arith- 
metic bit-vector functions and predicates in the signed representation are related 
to arithmetic integer functions and predicates as follows. 


Lemma 5. The following properties hold: 


1. Vbv,7(msb bv A dropmsb bv = zeros (size bv — 1)) = > toZ (negB bv) = 
—to-Z bv. 
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2. Vbv n,1 < size bv = > to_Z (sarB n bv) = (to_Z bv) quot 2”. 

3. Vbv cu,size bv = size cu => to_Z (mulB (sext (size cv) bu) (sext (size bv) 
cv)) =to_Z bu x toZ cv. 

4. Ybu cv, 1 < size bu ==> size bv = size cv => [=(msb buAdropmsb bu = zeros 
(size bv — 1)) V cu Æ ones (size cv)] = > to_Z (sdivB bv cv) = (to_Z bv) quot 
(to_Z cv). 

5. Vbv cv,1 < size bv = > size bv = size cu = > toZ (sremB bv cv) = 
(to_Z bu) rem (to_Z cv). 

6. Vbv cu,size bv = size cv => sltB bu cv = (toZ bv < to_Z cv). 

7. Vbv cv,size bv = size cv => sleB bv cv = (to_Z bv < toZ cv). 


In the lemma, sext n bv extends the bit-vector bv by n bits with the sign 
bit of bv, msb bv returns the sign bit of bv, and dropmsb bv drops the sign bit 
of bv. quot and rem are the quotient and remainder functions for Coq integers. 
Consider, for instance, the signed division function sdivB bv cv in coqg-nbits 
(Lemma 5(4)). If the dividend bv is of width > 1, the widths of bv and the 
divisor cv are equal, and bv is not of the form #b100---0 or cv is not of the 
form #b11---1, then the bit-vector sdivB bv cv represents the quotient of the 
integers represented by bv and cv. The condition may appear counter-intuitive. 
To see why it is necessary, consider bv = #b100---0 and cv = #b11---1 both of 
width w. bv and cv thus represent the integers —2“~' and —1 respectively. Their 
quotient 2”! however cannot be represented by bit-vectors of width w in two’s 
complement representation. The corner input case is hence excluded. The corner 
case is also excluded from the arithmetic negation function (Lemma 5(1)). 

The cog-nbits theory has several important differences from the prior COQ 
formalization in [15]. Our formal bit-vector theory supports both unsigned and 
two’s complement signed representations. It also provides the arithmetic division 
and remainder functions. Since these features are needed in the QF_BV logic of 
SMT-LIB, they are essential to the formalization of SMT QF_BV queries. Such 
important features unfortunately are lacking in the prior formalization. Another 
noted difference is the numeric representations used in theory developments. 
Since integers are needed for the QF_BV logic, coq-nbits naturally uses binary 
representations for integers and natural numbers in Coq. The prior formalization 
on the other hand is mainly based on the unary natural number representation 
but provides conversion to positive integers in the binary representation. 


5 Theory for SMT QF BV Queries 


Using cog-nbits, we formalize SMT QF_BV queries. Our formalization con- 
sists of two parts: a syntactic representation for SMT QF_BV queries in CoQ 
inductive types and a formal semantics in our bit-vector theory coq-nbits. 


5.1 Syntax of SMT QF BV Queries 


An SMT QF BV query is a CoQ term of the data type bexp. It can be constants 
Bfalse or Btrue, a unary predicate Bnot, or binary predicates Band or Bor for 
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Boolean connectives. Additionally, Boveq and Bbvult with two arguments of the 
data type exp are binary QF_BV predicates. 


Inductive bexp : Type := Bfalse : bexp | Btrue : bexp 

| Bnot : bexp -> bexp 

| Band : bexp -> bexp -> bexp | Bor : bexp -> bexp -> bexp 
| Bbveq : exp -> exp -> bexp | Bbvult : exp -> exp -> bexp 
(* other QF_BV predicates *) 

end with exp : Type := 


Evar : var -> exp | Econst : bits -> exp 

Ebvnot : exp -> exp 

Ebvand : exp -> exp -> exp | Ebvor : exp -> exp -> exp 

Ebvshl : exp -> exp -> exp | Ebvlshr : exp -> exp -> exp 
Ebvneg : exp -> exp 

Ebvadd : exp -> exp -> exp | Ebvmul : exp -> exp -> exp 


| 
| 
| 
| 
| 
| : 
| Ebvudiv : exp -> exp -> exp | Ebvurem : exp -> exp -> exp 
| Eextract : nat -> nat -> exp -> exp 

| Econcat : exp -> exp -> exp 

(* other QF_BV operations *) 

| Ebvsub : exp -> exp -> exp 

end. 


A Cog term of the data type exp represents a QF_BV expression. It can be 
a QF. BV variable Evar vid with a variable identifier vid : var, a bit-vector con- 
stant Econst bv with bv : bits, a bitwise-not operation Ebvnot eo, a bitwise-and 
operation Ebvand eọ e1, a bitwise-or operation Ebvor eo e1, a logical left-shift 
operation Ebvshl eọ e1, or a logical right-shift operation Ebvlshr ep e1. For 
arithmetic operations, there are Ebvneg ey for negation, Ebvadd eg e; for addi- 
tion, Ebvmul eọ e1 for multiplication, Ebvudiv ep e1 for unsigned division, and 
Ebvurem eọ €; for unsigned remainder with eo, e1 : exp. Finally, the extraction 
Eextract i j eo and the concatenation Econcat eg e operations have the data 
type exp with i,j : nat and eg, e) : exp. 


5.2 Semantics of SMT QE_BV Queries 


In our CoQ formalization, an SMT QF_BV query is interpreted on stores. A 
store is a mapping from QF_BV variables to bits. Let ø be a store. The inter- 
pretation of be : bexp on ø is a Boolean value; the interpretation of e : exp on o 
is a bit-vector. Semantic functions eval_bexp and eval_exp are as follows. 


Fixpoint eval_bexp (be : bexp) (o : store) : bool := 
match be with 
| Bfalse => false 

Btrue => true 


| 

| Bnot beg => ~~ (Ceval_bexp beo o) 

| Band beo be; => (eval_bexp beo o) && (Ceval_bexp be: o) 
| Bor beo be; => (Ceval_bexp beo o) || (Ceval_bexp bei o) 
| Bbveq €9 €1 => (eval_exp eo go) == (eval_exp eı o) 

| 


Bbvult eo €1 => ItB (eval_exp e9 o) (eval_exp e1 o) 


CoQQFBV: A Scalable Certified SMT Quantifier-Free Bit-Vector Solver 159 


(* other QF_BV predicates *) 

end with eval_exp (e : exp) (ø : store) : bits := 

match e with 

| Evar v => Store.acc v ao 

| Econst bv => bv 

| Ebvnot eo => invB (eval_exp eo o) 

| Ebvand eo eı => andB (eval_exp eo o) (eval_exp e1 o) 

| Ebvor eọ €1 => orB (eval_exp eo o) (eval_exp e1 o) 

| Ebvshl eo eı => shIB (to_nat (eval_exp e1 o)) (eval_exp eo o) 
| Ebvlshr eo eı => shrB (to_nat (eval_exp e1 0)) (eval_exp eo o) 
| Ebvneg eo => negB (eval_exp eo o) 

| Ebvadd eo e1 => addB (eval_exp eo o) (eval_exp e1 o) 

| Ebvmul eo eı => mulB (eval_exp eo o) (eval_exp e1 o) 

| Ebvudiv eo e1 => divB (eval_exp e9 o) (eval_exp e1 o) 

| Ebvurem ep €1 => remB (eval_exp eo o) (eval_exp e1 o) 

| Eextract i j eo => extract i j (eval_exp eo o) 

| Econcat eo €1 => (eval_exp e1 ga) ++ (eval_exp eo o) 

(* other QF_BV operations *) 

| Ebvsub eo eı => subB (eval_exp eo o) (eval_exp e1 c) 

end. 


An SMT QF BV query denotes a value in the CoQ data type bool. Bfalse 
and Btrue denote false and true respectively. Boolean negation, conjunction, 
and disjunction correspond to ~~, &&, and || in bool respectively. For QF_BV 
predicates, the bit-vector equality Bbveq is interpreted by the equality == for 
Boolean sequences. The coq-nbits function ItB is used to interpret Bbvult. 

A QF_BV expression denotes a bit-vector. For basic cases, QF_BV variables 
are interpreted by corresponding bit-vectors in the store ø through the store 
access function Store.acc; bit-vector constants are interpreted by themselves. 
Bitwise logical operations Ebvnot, Ebvand, and Ebvor are interpreted by cor- 
responding coq-nbits functions invB, andB, and orB respectively. For logical 
shift operations, the offset eı is first converted to a natural number through 
to_nat (eval_exp e a) and then passed to the corresponding logical shift func- 
tions shIB or shrB in cog-nbits. QF_BV arithmetic operations are interpreted by 
corresponding coq-nbits arithmetic functions as expected. Finally, the extrac- 
tion Eextract and concatenation Econcat operations are interpreted by extract 
and ++ in coq-nbits respectively. 

In an SMT QF BV query, a QF_BV variable designates a bit-vector of a 
certain width. An SMT QF BV query is hence associated with a signature X 
mapping QF_BV variables to their respective widths. A store ø conforms to a 
signature » if the interpretation of each QF BV variable on o has the same width 
as specified in X. Given an SMT QF BV query be : bexp with its signature X, 
be is satisfiable if there is a store ø conforming to X and eval_bexp be o = true. 


5.3 Derived QF BV Operations and Predicates 


In the QF BV logic of SMT-LIB, a number of QF BV operations and predi- 
cates are derived from a small set of core operations and predicates. Consider 
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the signed comparison predicate busit bv cv in SMT-LIB: 


busit bv cv £ (or (and (= (extract (w —1) (w — 1) bv) #b1) 
(= (extract (w — 1) (w — 1) cv) #b0)) 
(and (= (extract (w — 1) (w — 1) bv) 
(extract (w — 1) (w — 1) cv)) 
(bvult bv cv))). 


To compare two bit-vectors of width w in two’s complement representation, 
the sign bits are checked. If bv is negative but cv is positive, buslt bv cv is true. 
Otherwise, the signed predicate checks that both operands have the same sign 
and compares the operands using the unsigned comparison predicate. Interest- 
ingly, the arithmetic subtraction operation is actually a derived operation in 
SMT-LIB: busub bv cv = bvadd bu (buneg cv). The arithmetic operation is 
defined to be the bit-vector sum of minuend and the negation of subtrahend. 
It is not, for instance, defined as nat2bu(w, bu2nat(bv) — bu2nat(cv)) because 
bu2nat(bv) — bu2nat(cv) may not be a natural number. 

For derived operations and predicates, there is a subtle yet important dif- 
ference between our formal semantics and those defined in SMT-LIB. In our 
formal bit-vector theory coq-nbits, most functions and predicates are defined 
directly. Particularly, the arithmetic subtraction function subB is defined by one- 
bit subtractors in coq-nbits. Our formal semantics for the QF BV arithmetic 
operation busub therefore is defined by the corresponding bit-vector function 
subB. Since our formal semantics did not define busub by bvadd and buneg, it 
could be different from those in SMT-LIB. In order to build a certified solver 
for the QF BV logic of SMT-LIB, it is necessary to establish semantic equiva- 
lences between both semantic definitions for all derived QF_BV operations and 
predicates. 

To justify our formal semantics, we show the semantics of our definitions and 
those of SMT-LIB indeed denote the same bit-vector functions or predicates. 
Consider again the subtraction operation. Recall the semantics of the arithmetic 
operations bvadd and buneg are defined by the bit-vector functions addB and 
negB respectively. The next lemma is useful to show the semantic equivalence: 


Lemma 6. Vbv cv,size bv = size cu = > subB bu cv = addB bv (negB cv). 


For all derived QF_BV operations and predicates, we give COQ proofs for the 
equivalence between our formal semantics and those of SMT-LIB. Particularly, 
semantics of all QF BV arithmetic operations and predicates over two’s comple- 
ment representation are equivalent to those in SMT-LIB. Our formal semantics 
for QF_BV queries is thus certified to be equivalent to SMT-LIB. 


6 Certified Bit Blasting 


Recall that a SAT query is a Boolean CNF formula. Given an SMT QF BV 
query, a bit blasting algorithm computes a SAT query that is satisfiable if and 
only if the given SMT QF BV query is satisfiable. Although it is the standard 
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technique for solving SMT QF BV queries, bit blasting can be very complex 
due to arithmetic operations and various optimizations. Bit blasting algorithms 
therefore can be tedious to construct and thus prone to errors. We verify a bit 
blasting algorithm for SMT QF_BV queries using our Coq formalization. 

Let us start with a simple formalization of Boolean CNF formulae. In our 
formalization, a clause is represented by a sequence of literals; a CNF formula 
in turn is represented by a sequence of clauses. Let bvar be the data type for 
Boolean variables. We have the following data types in Coa: 


Inductive lit : Set := Pos of bvar | Neg of bvar. 
Definition clause : Set := seq lit. 
Definition CNF : Set := seq clause. 


Define an environment € to be a mapping from bvar to bool. Given a literal 
£, a CNF formula f, and an environment e€, it is straightforward to define the 
semantic functions eval_lit £ €: bool and eval-cnf f e: bool. A SAT query 
f is satisfiable if there is an environment e such that eval_cnf f €= true. 

To illustrate how our COQ proof works, consider Tseitin transformation for 
the logical negation operation: 


Definition bit_blast_Bnot l : lit * CNF := 
let r := a fresh literal in 


Gris. ests ee? oes be Date re E ADS 


Given a literal 4, bit_-blast_Bnot @ returns a new literal r and the CNF 
formula (r V £) A (ar V ~£). Tseitin transformation ensures the interpretations of 
£ and r are complementary on any environment € evaluating the CNF formula 
to true. We give a formal proof using our formalization in Coa: 


Lemma 7. Vr cnf L €,(r,cnf) = bit_-blast_Bnot l = > eval_cnf cnf e = 
true = > eval_lit r e= ~~ (eval_lit £ €). 


The idea is generalized to QF_BV operations naturally. For each QF_BV 
operation, we construct a literal sequence 7 and a Boolean CNF formula cnf. 
If cnf evaluates to true on an environment <€, the interpretation of 7 on € needs 
to reflect the semantics of the QF BV operation. For instance, a COQ proof is 
given for the QF BV addition operation: 


Lemma 8. YF cnf l l €,(7,cnf) = bit_-blast_Ebvadd b f; => eval- 
cnf cnf e = true = > eval_lits re = addB (eval_lits % €) (eval_lits 4 €). 


Given two literal sequences D and “, bit_blast_Ebvadd D M returns a literal 
sequence 7 and a CNF formula cnf. If cnf evaluates to true on an environment e, 
then the interpretation of the literal sequence T on € is indeed the bit-vector sum 
of the interpretations of l) and 0; on e. Bit blasting algorithms for other QF_ BV 
operations are given and shown to reflect the semantics of corresponding func- 
tions defined in the bit-vector theory coq-nbits. Particularly, our bit blasting 
algorithms for arithmetic division and remainder correctly reflect corresponding 
arithmetic bit-vector functions in coq-nbits. 
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Recall that the semantics for SMT QF BV queries is defined over stores for 
QF_BV variables. In order to prove the correctness of bit blasting algorithms, 
one has to relate stores for QF_BV variables with environments for Boolean 
variables. The relation is explicated through literal correspondences. A literal 
correspondence 7 is a mapping from QF BV variables to sequences of literals. 
For each QF BV variable v, the literal sequence 7(v) is meant to interpret v on 
environments for Boolean variables. More formally, let eval_lits l e : bits be 
the bit-vector for the literal sequence A interpreted on the environment e. The 
bit-vector eval_lits 7(v) e€ is hence the interpretation of the QF_BV variable v 
on the environment c€. Let o be a store and 7 a literal correspondence. An envi- 
ronment € is consistent with o through r if the bit-vectors eval_lits m(v) € and 
Store.acc v o are equal for every QF BV variable v in ø. Thus, an environment 
is consistent with a store if their interpretations of variables coincide. 

It is now straightforward to give our bit blasting algorithm for SMT QF_BV 
queries. For each QF_BV expression, our algorithm first computes literals and 
CNF formulae for operands recursively. It then invokes an auxiliary bit blasting 
algorithm to construct result literals and a CNF formula for the QF_BV oper- 
ation. The literal correspondence is also updated when literals are allocated for 
QF_BV variables. Finally, the result literals and the updated literal correspon- 
dence are returned along with the concatenation of all CNF formulae. 


Definition bit_blast_bexp X m b : lit * correspondence * CNF := 
match be with 
| Bnot beg => 
let (ro, 7’, cnfg) := bit_blast_bexp X m beo in 
let (r, cnf) := bit_blast_Bnot ro in 
(r, n’, cnf ++ cnfo) 
(* other QF_BV predicates *) 
end with bit_blast_exp X m e : seq lit * correspondence * CNF := 
match e with 
| Evar v => 
if n(v) is defined then (n(v), m, [::]) 


else let 7 := fresh literals for v according to X in 
let n’ := update n with vi F in 
(F, a’, i:l) 
| Ebvadd eg eq => 
let (ro, 7’, cnfy) := bit_-blast_exp X m eo in 
let (71, n”, cnf,) := bit_-blast_exp X a’ e} in 
let (7, cnf) := bit blast Ebvadd 7 7, in 
(F, n”, cnf ++ cnfo ++ cnf) 
(* other QF_BV operations *) 


end. 


The following COQ theorem establishes the connection between the output 
literals and the input SMT QF_BV query or expression of the algorithm. 


Theorem 1. Let be : bexp be an SMT QF_BV query with the signature Xre, 
e : exp a QF BV ezpression with the signature Xe, and To the empty literal 
correspondence. 
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1. Vr cnf o e, (7,7, cnf) = bit_blast_bexp 14. To be => o conforms to Sre 
=> eis consistent witho through a => eval_cnf cnf €e = true => 
eval_lit r e= eval_bexp be a. 

2. Vr n cnf o €,(7,7, cnf) = bit_blast_exp Xe To e = > o conforms to Xe 
=> € is consistent with o through mn => eval_cnf cnf €e = true => 
eval_lits r € = eval_exp e v. 


Let be be an SMT QF BV query with the signature Xe, r and cnf the 
literal and CNF formula returned by bit-blast-bexp respectively. Consider any 
store conforming to Xe and any environment consistent with the store. If the 
environment evaluates the formula cnf to true, Theorem 1 says that the literal r 
and the SMT QF_BV query be evaluate to the same Boolean value on the envi- 
ronment and store respectively. In other words, the algorithm bit_blast_bexp 
is a generalized Tseitin transformation for SMT QF_BV queries. Particularly, 
all QF BV arithmetic operations (addition, subtraction, multiplication, divi- 
sion, and remainder in the unsigned and two’s complement representations) are 
transformed to CNF formulae with formal proofs of correctness in COQ. 

A useful corollary to Theorem 1 is the reduction of the satisfiability of 
SMT QF BV queries to the satisfiability of SAT queries. 


Corollary 1. Let be : bexp be an SMT QF BV query with the signature X'se 
and To the empty literal correspondence. Then 


Vr m cnf, (r, n, cnf) = bit blast bexp Xebe To be => 
[((do,o conforms to Xre ^ eval_bexp be o = true) <=> 
(de, eval_cnf ([:: [!: r]] ++ cnf) e= true)]. 


Corollary 1 gives the formal proof of correctness for our bit blasting algorithm 
bit_blast_bexp. Let be be an arbitrary SMT QF BV query, r and cnf the literal 
and the CNF formula returned by the algorithm. The corollary shows that the 
query be is satisfiable if and only of the SAT query r A cnf is satisfiable. An 
equi-satisfiable SAT query is indeed obtained from the bit blasting algorithm 
on every input SMT QF_BV query with a formal proof of correctness. 

Recall that several QF_BV operations and predicates are derived from a 
small number of operations and predicates in SMT-LIB. A naive bit blasting 
algorithm could expand derived operations or predicates, and then perform bit 
blasting on a small set of operations and predicates. Such an algorithm would 
have a simpler proof of correctness but generate more intermediate literals and 
clauses. For instance, the naive algorithm for buvsub would perform bit blasting on 
buneg followed by bvadd with intermediate literals and clauses. Our bit blasting 
algorithm for busub on the other hand reflects our semantics defined by the bit- 
vector function subB. Intermediate literals or clauses are not needed. Our bit 
blasting algorithm hence transforms busub more economically than the naive 
algorithm. 

To improve our bit blasting algorithm further, a cache for QF BV expressions 
and predicates is added. In large queries, QF BV expressions and predicates can 
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occur a number of times. If a QF BV expression has several occurrences, our 
basic bit blasting algorithm will generate result literals and CNF formulae for 
each occurrence. Consider the SMTQF_BV query 


(and (buslt #b1000 (bvadd x y)) (buslt (buadd x y) #b0111)). 


The query checks whether the sum of the QF BV variables x and y can be in 
a proper range. Since the Boolean predicate and has two operands, our basic 
algorithm invokes the auxiliary bit blasting algorithm for the two comparison 
predicates. It in turn blasts the same expression bvadd x y twice. Repeated bit 
blasting on the same expression or predicate is redundant. A hash function can 
detect repeated QF_BV expressions and predicates easily. When an expression 
or a predicate recurs, the previously computed literals with the empty CNF 
formula are returned from a cache as the result. More importantly, we give a 
formal Coq proof of Corollary 1 for the bit blasting algorithm with a cache. 


7 A Certified SMT QE BV Solver 


We have so far built a formally verified bit blasting algorithm for SMT QEF BV 
queries. Using the code extraction mechanism in CoQ, an OCAML program 
corresponding to the verified bit blasting algorithm is obtained. Using a SAT 
solver and a SAT certificate checker, a certified SMTQF_BV solver can be 
constructed. Figure 1 gives the flow of our certified solver. 

SAT 


be : bexp OCamt | cnf :CNF | SAT 
ox 
program solver 
UNSA certificate UNSAT 
checker 


Fig. 1. Certified SMT QF_BV Solver 


SAT 


In the figure, the extracted OCAML program takes an OCAML expression be 
of the type bexp as an input (Sect. 5). The verified program performs bit blasting 
on the SMT QF_BV query and returns an OCAML expression cnf of the type 
lit list list representing a SAT query (Sect.6). Precisely, an OCAML term 
of the type lit represents a literal. The OCAML type lit list corresponds 
to the data type for clauses; and the type lit list list corresponds to the 
data type for CNF formulae. The expression cnf is sent to a SAT solver to 
check satisfiability. If the SAT solver reports SAT, the SMT QF_BV query 
represented by be is satisfiable. Otherwise, the SAT solver reports UNSAT with 
a certificate. The certificate is sent to a SAT certificate checker for validation. 
If it is validated, the SMT QF_BV query be is unsatisfiable with certification. 
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8 Experiments 


In order to evaluate the performance of our verified OCAML bit blasting pro- 
gram, we instantiate our SMT QE_BV solver COQQFBV based on Fig. 1 as 
follows. We write an OCAML parser to translate a text file in the SMT-LIB 
format to an SMT QF BV query in our formal syntax. The query is sent to 
the verified OCAML program for bit blasting. We then add an OCAML pro- 
gram to transform the output SAT query to a text file in the DIMACS format. 
The 2020 SAT Competition winner KIssaT [5] is used to check the satisfiabil- 
ity of the SAT query. If the SAT solver reports UNSAT with a certificate in 
the DRAT format [31], the certificate is sent to the verified certificate checker 
GRATCHK [16] for validation. Certificate checkers for SAT solvers use much sim- 
pler algorithms than certificate checkers for SMT solvers. They are hence easier 
to build and prove correct. The correctness of GRATCHK is in fact verified by the 
proof assistant ISABELLE [22]. We need not trust the certificate checker either. 

We ran two experiments to evaluate our certified SMT QF BV solver. The 
first experiment is the QF BV division of the single query track in the 2020 
SMT Competition [2]. The second experiment consists of verification problems 
from various assembly implementations for linear field arithmetic in cryptog- 
raphy libraries such as OpenSSL [30], RELIC [1], and BLST [29]. We compare 
CogQFBV against three SMT QE BV solvers: CVC4 [4] with an LFSC certifi- 
cate checker [27], the 2020 SMTQEF BV division winner BITWUZLA [20], and the 
2019 SMT QF-_BV division winner BOOLECTOR [21]. BITWUZLA and BOOLEC- 
TOR are designed for efficiency without certification. CVC4 provides an LFSC 
certificate checker implemented in C [26]. The certificate checker can validate 
certificates from different theories but is itself not verified. All experiments were 
run on a Linux machine with a 3.20 GHz CPU and 1 TB memory.! 


8.1 SMT QF BV Competition 


The first experiment is running our certified solver COQQFBV on tasks from 
the QF. BV division of the 2020 SMT Competition. We set 60 GB memory limit 
and 20 min timeout for each task as in the competition. A task solves a single 
SMT-LIB file sequentially. The SMT QF BV division contains 6861 files in the 
SMT-LIB format. All files are marked with unsat, sat, or unknown indicating 
expected query results. To save running time, we ran 10 tasks concurrently. The 
experimental results are summarized in Table 1. 

In the table, the column Ngo indicates the number of solved tasks with 
certification. Osc is the number of timeouts. Esc shows the number of unsolved 
tasks due to tool errors. Tsc is the average time for solved tasks. COQQFBV 
solves 6087 (88.72%) and CVC4 with its certificate checker solves 3840 (55.97%) 
with certification. We observe three stack overflow errors during bit blasting in 
CoQQFBV. These errors are induced by deep recursion. Among 328 errors from 
CVC4, 249 are segmentation faults raised by the LFSC certificate checker. 


1 CoQQFBV is available at https://github.com/fmlab-iis/coq-qfbv.git. 
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Table 1. Experimental results on the 2020 SMT QF_BV division 


Tool Nsc Osc |Esc|Tsc | Ns Os | Es Ts 

CoQQFBV | 6087 | (88.72%) | 771 |3 119.69 | 6169 | (89.91%) | 689 |3 | 81.74 
CVC4 3840 | (55.97%) | 2693 | 328 | 74.63 | 4255 | (62.02%) | 2544 | 62 | 56.87 
BITWUZLA |- - - - - 6739 | (98.22%) | 122 0 | 16.09 
BOOLECTOR | — - - - = 6719 | (97.93%) |142 |0 | 15.44 


( ) 
( ) 
( ) 
( ) 


Table 2. Experimental results on the 2020 SMT QF BV division by categories 


Tool Nso Tsc Psu Ns Ts 
4238 unsat tasks 


CoQQFBV 3838 | (90.56%) | 146.72 | 291.35 MB 3920 | (92.50%) | 86.51 
CVC4 1762 | (41.58%) | 86.68 | 266.61 MB | 2177 | (51.37%) | 49.68 
BITWUZLA |- - - = 4188 | (98.82%) | 12.75 
BOOLECTOR | — 7 S = 4180 | (98.63%) | 11.72 
2553 sat tasks 

CoQQFBV |- -= 7 — 2242 | (87.82%) | 73.26 
CVC4 = = = = 2078 | (81.39%) | 64.41 
BITWUZLA |- = 5 = 2524 | (98.86%) | 21.08 
BOOLECTOR | — = = - 2516 | (98.55%) | 21.31 
70 unknown tasks 

CoQQFBV |5 (7.14%) | 173.17 | 203.52 MB 7 | (10.00%) | 128.26 
CVC4 s lg = = 0 | (0.00%) |- 
BITWUZLA  — - - -— 27 | (38.57%) | 66.36 
BOOLECTOR — - - - 23 | (32.86%) | 48.58 


The same table also compares against efficient but uncertified solvers. To 
evaluate the overhead from certificate checking, the two certified solvers CoQ- 
QFBV and CVC4 still generate certificates but do not validate them. The 
column Ng gives the number of solved tasks without certification. Og is the 
number of timeouts. Es indicates the number of errors, and Ts is the average 
time for solved tasks. Our certified solver COQQFBV finishes 6169 (89.91%) 
tasks. The CVC4 solver finishes 4255 (62.02%) tasks. COQQFBV and CVC4 
solve 82(= 6169 — 6087) and 415(= 4255 — 3840) more tasks without certifi- 
cation respectively. Since our bit blasting algorithm is verified for all inputs, 
CoQQFBV does not certify bit blasting on each query and hence induces less 
overhead. The 2020 and 2019 SMT QF BV division winners BITWUZLA and 
BOOLECTOR finish 6739 (98.22%) and 6719 (97.93%) tasks without certification 
respectively. COQQFBV solves about 10% less tasks with certification than the 
2020 track winner BITWUZLA without certification. It also performs significantly 
better than CVC4 with a general SMT certificate checker. 
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Table2 compares the four solvers by tasks from the three expected query 
results. Among the 4238 unsat tasks, COQQFBV and CVC4 give certified 
answers to 3838 (90.56%) and 1762 (41.58%) of them respectively. The col- 
umn Psy gives the average size of certificates. Efficient solvers BITWUZLA and 
BOOLECTOR give 4188 (98.82%) and 4180 (98.63%) uncertified answers respec- 
tively. 

Among the 2553 sat tasks, BITWUZLA and BOOLECTOR finish 2524 (98.86%) 
and 2516 (98.55%) of them respectively. COQQFBV and CVC4 solve only 2242 
(87.82%) and 2078 (81.39%) sat tasks respectively. For the 70 tasks marked 
unknown, BITWUZLA and BOOLECTOR respectively answer 27 (38.57%) and 
23 (32.86%) of them without certification. Our certified SMT QF BV solver 
finds two sat and five unsat tasks. Answers to the five unsat tasks are all cer- 
tified. CVC4 with its certificate checker fails to solve any unknown task. For 
the benchmarks from the 2020 SMT QF_BV division, our certified solver COQ- 
QFBV appears to be more scalable than CVC4 with its general SMT certificate 
checker. 


Table 3. Average time for COQQFBV components 


Task Tsp |Tsar | Tcert 
unsat 41.84) 49.92 | 73.51 
sat 37.08 | 62.09 | — 

unknown | 32.34 | 121.99 | 62.86 


Table 3 further decomposes the time spent on different components in CoQ- 
QFBV. The column Tgg gives the average time for our verified OCAML bit 
blasting program; Tgar gives the average time used by the SAT solver KISSAT; 
and Ter; contains the average time for the certificate checker GRATCHK. For 
the tasks in the QF BV division, the time for SAT solving and certificate check- 
ing are comparable. In comparison, the OCAML bit blasting program seems to 
take an unexpectedly large amount of time and hence can still be improved. 


8.2 Linear Field Arithmetic in Cryptography 


In this section, we evaluate our certified SMT QF_BV solver on benchmarks from 
real-world assembly implementations in various cryptography libraries such as 
OpenSSL [30], RELIC [1], and BLST [29]. In elliptic curve cryptography, arith- 
metic operations over large finite fields are needed. A field element is typically 
represented by hundreds of bits. A field arithmetic operation takes two field ele- 
ments and returns a field element as the result. In the signature scheme Ed25519 
used in OpenSSH, for instance, a field element belongs to the residue system 
modulo the prime number 2?°° — 19. Field sum of two field elements is obtained 
by the arithmetic sum modulo 2?°° — 19. Commodity processors however do not 
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Table 4. Experimental results on cryptographic assembly program verification 


Tool Nsc Tso | Psu Ns Ts 

CoQQFBV |93 | (96.88%) | 121.42 | 168.45 MB 93 (96.88%) | 68.96 
CVC4 19 | (19.79%) | 6.66 267.92 MB 46 (47.92%) | 40.16 
BITWUZLA |- |- -— - 88 (91.67%) | 16.07 
BOOLECTOR |- |- z z 96 (100.00%) 18.25 


support arithmetic instructions with operands in hundreds of bits natively. Field 
arithmetic has to be implemented by 32- or 64-bit instructions. The functional 
specification of the field addition used in Ed25519 may look as follows. 


(ea Oe < 2255 — 19 A Dho bee < 2255 — 19} 
x25519_fe64_add(ro, 11, T2, T3, Q0, 41, 42, 43, bo, b1, b2, b3) 
Dior 2 = eg i x 264i 4 mad 2755 — 19) 
A 
Ea ri x 964xi < 2255 _ 19 

Let ai, bi, ci be 64-bit variables (registers) for 0 < i < 3. The specification 
says that the output field element represented by r;’s computed by the program 
x25519 fe64_add is the field arithmetic sum of the input elements represented 
by a,;’s and b;’s. In finite field arithmetic programs, over- or under-flow in assem- 
bly instructions lead to incorrect results, and bit-accurate program verification is 
required. We obtain 46 implementations and generate 96 SMT QF_BV queries 
from verification conditions in order to evaluate our certified solver in this exper- 
iment. 

Table 4 shows the verification results with the same memory and time limits 
in the 2020 SMT Competition. All SMT QF_BV queries are expected to be 
unsatisfiable. BOOLECTOR successfully solves all queries (100%) without certifi- 
cation. The 2020 QF BV track winner BITWUZLA finishes 88 queries (91.67%) 
without certification. Surprisingly, COQQFBV gives certified answers to 93 
queries (96.88%). The verified SAT certificate checker GRATCHK used in Coq- 
QFBYV successfully validates all certificates for the real-world cryptographic pro- 
gram verification problems. In comparison, CVC4 solves 46 queries (47.92%) 
but certifies only 19 (19.79%). The CVC4 certificate checker raises segmenta- 
tion faults on the 27 (= 46—19) solved but uncertified queries. These certificates 
are perhaps too complicated to be validated by the unverified LFSC certificate 
checker. For the SMT QF_BV queries from real-world program verification prob- 
lems, our certified solver COQQFBV seems to perform slightly better than the 
efficient but uncertified SMT QF_BV solver BITWUZLA. Our certified solver is 
probably scalable enough for certain bit-accurate program verification problems. 
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9 Conclusion 


We combine algorithm design with interactive theorem proving to build a scal- 
able certified SMT QF_BV solver COQQFBYV in this work. Our certified solver 
employs a verified OCAML bit blasting program and the verified certificate 
checker GRATCHK to improve the confidence in SMT QEBV query results. 
Experiments on the QF BV division of the 2020 SMT Competition and real- 
world cryptographic program verification suggest that COQQFBV is useful. 
For future work, we plan to specify and verify more heuristics to further opti- 
mize COQQFBV. Particularly, cryptographic program verification requires more 
sophisticated range checks. More verified bit blasting algorithms for such checks 
will undoubtedly improve the confidence of bit-accurate program verification. 
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Abstract. We introduce the notion of porous invariants for multipath 
(or branching/nondeterministic) affine loops over the integers; these 
invariants are not necessarily convex, and can in fact contain infinitely 
many ‘holes’. Nevertheless, we show that in many cases such invariants 
can be automatically synthesised, and moreover can be used to settle 
(non-)reachability questions for various interesting classes of affine loops 
and target sets. 


Keywords: Linear dynamical systems - Linear loops - Invariants - 
Reachability - Presburger arithmetic 


1 Introduction 


We consider the reachability problem for multipath (or branching) affine loops 
over the integers, or equivalently for nondeterministic integer linear dynamical 
systems. A (deterministic) integer linear dynamical system consists of an update 
matrix M € Z?*¢ together with an initial point 2 € Z4. We associate to such 
a system its infinite orbit (xc) consisting of the sequence of reachable points 
defined by the rule 2+) = Ma. The reachability question then asks, given 
a target set Y, whether the orbit ever meets Y, i.e., whether there exists some 
time i such that 2 € Y. The nondeterministic reachability question allows the 
linear update map to be chosen at each step from a fixed finite collection of 
matrices. 

When the orbit does eventually hit the target, one can easily substantiate this 
by exhibiting the relevant finite prefix. However, establishing non-reachability is 
intrinsically more difficult, since the orbit consists of an infinite sequence of 
points. One requires some sort of finitary certificate, which must be a relatively 
simple object that can be inspected and which provides a proof that the set 
Y is indeed unreachable. Typically, such a certificate will consist of an over- 
approximation I of the set R of reachable points, in such a manner that one can 
check both that Y N I = Ý and R C J; such a set IJ is called an invariant. 

Formally we study the following problem for inductive invariants: 
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Meta Problem 1. Consider a system with update functions f,,..., fn. A set I 
is an inductive invariant if f;(1) C I for alli. Given a reachability query (x, Y) 
we search for a separating inductive invariant I such that x € I and Y QI =Ù. 


Meta Problem 1 is parametrised by the type of invariants and targets that 
are considered; that is, what are the classes of allowable invariant sets J and 
target sets Y, or equivalently how are such sets allowed to be expressed. 

Fixing a particular invariant and target domain, a reachability query has 
three possible scenarios: (1) the instance is reachable, (2) the instance is unreach- 
able and a separating invariant from the domain exists, or (3) the instance is 
unreachable but no separating invariant exists. Ideally, one would wish to pro- 
vide a sufficiently expressive invariant domain so that the latter case does not 
occur, whilst keeping the resulting invariants as simple as possible and com- 
putable. For some classes of systems, it is known that distinguishing reachability 
(1) from unreachability (2, 3) is undecidable; it can also happen that determin- 
ing whether a separating invariant exists (i.e., distinguishing (2) from (3)) is 
undecidable. 

We note that the existence of strongest inductive invariants! is a desirable 
property for an invariant domain—when strongest invariants exist (and can be 
computed), separating (2) from (1, 3) is easy: compute the strongest invariant, 
and check whether it excludes the target state or not; if so, then you are done, 
and if not, no other invariant (from that class) can possibly do the trick either. 
However, unless (3) is excluded, computing the strongest invariant does not nec- 
essarily imply that reachability is decidable. Unfortunately, strongest invariants 
are not always guaranteed to exist for a particular invariant domain, although 
some separating inductive invariant may still exist for every target (or indeed 
may not). 

In prior work from the literature, typical classes of invariants are usually 
convex, or finite unions of convex sets. In this paper we consider certain classes of 
invariants that can have infinitely many ‘holes’ (albeit in a structured and regular 
way); we call such sets porous invariants. These invariants can be represented via 
Presburger arithmetic”. We shall work instead with the equivalent formulation 
of semi-linear sets, generalising ultimately periodic sets to higher dimensions, 
as finite unions of linear sets of the form {b + pıN +---+ PmN} (by which we 
mean {b+ a1pı +: + amPm | @1,---;4m E N}, see Definition 2). 

Let us first consider a motivating example: 


Example 1 (Hofstadter’s MU Puzzle [7|). Consider the following term-rewriting 
puzzle over alphabet {M,U,I}. Start with the word MI, and by applying the 
following grammar rules (where y and z stand for arbitrary words over our 
alphabet), we ask whether the word MU can ever be reached. 


yl > ylU | My— Myy | ylllz> ye | yUUz > yz 


1 Given two invariants I and I’, we say that I is stronger than I’ iff I C I’; thus 
strongest invariants correspond to smallest invariant sets. 

? Presburger arithmetic is a decidable theory over the natural numbers, comprising 
Boolean operations, first-order quantification, and addition (but not multiplication). 
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The answer is no. One way to establish this is to keep track of the number 
of occurrences of the letter ‘I’ in the words that can be produced, and observe 
that this number (call it x) will always be congruent to either 1 or 2 modulo 3. 
In other words, it is not possible to reach the set {2 | x = 0 mod 3}. Indeed, 
Rules 2 and 3 are the only rules that affect the number of J’s, and can be 
described by the system dynamics 7 + 2x and x x — 3. Hence the MU Puzzle 
can be viewed as a one-dimensional system with two affine updates,’ or a two- 
dimensional system with two linear updates.* The set {1+3Z}U {2+ 3Z} is 
an inductive invariant, and we wish to synthesise this. (The stability of this set 
under our two affine functions is easily checked: both components are invariant 
under z ++ x — 3, and {1+ 3Z} + {2+6Z} C {2+3Z} under x + 2x, and 
similarly {2+ 3Z} +> {4 + 6Z} C {14 3Z}.) 

The problem can be rephrased as a safety property of the following multipath 
loop, verifying that the ‘bad’ state x = 0 is never reached, or equivalently that 
the above loop can never halt, regardless of the nondeterministic choices made. 


x=1 
while x 40 
x=2x||x=x-3 (where || represents nondeterministic branching) 


The MU Puzzle was presented as a challenge for algorithmic verification in [4]; 
the tools considered in that paper (and elsewhere, to the best of our knowledge) 
rely upon the manual provision of an abstract invariant template. Our approach 
is to find the invariant fully automatically (although one must still abstract from 
the MU Puzzle the correct formulation as the program x +> 2z || x > a — 3). 


Main Contributions. Our focus is on the automatic generation of porous 
invariants for multipath affine loops over the integers, or equivalently nondeter- 
ministic integer linear dynamical systems. 


— We first consider targets consisting of a single vector (or ‘point targets’), and 
present the classes of invariants and systems for which invariants can and 
cannot be automatically computed for the reachability question. A summary 
of the results for linear and semi-linear invariants for these targets is given in 
Table 1. For completeness we also consider R, R;-(semi)-linear sets, where we 
complete the picture from prior work by showing that strongest R-semi-linear 
invariants are computable. 

e We establish the existence of strongest Z-linear invariants, and show that 
they can be found algorithmically (Theorem 2). These invariants may or 
may not separate the target under consideration. 

e If a Zlinear invariant is not separating, we may instead look for an N- 
semi-linear invariant (which generalises both Z-semi-linear and N-linear 
invariants), and we show that such an invariant can always be found 

3 One-dimensional affine updates are functions of the form f(x) = ax + b. 

7 & ') (i) = GS oa models affine functions using a matrix representation, hold- 


01) \1 1 
ing one of the entries fixed to 1. 
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Table 1. Results for integer linear dynamical systems for a point target. Det /Non refers 
to deterministic or nondeterministic LDS. “Subsumed by ...” means that sufficient 
invariants can be generated, but of a more general type. 


Dom | D/N | Linear Semi-linear (SL) 

Z Det | Strongest computable (Theorem 2) No strongest (Sect. 4.1); subsumed by N-SL 

Z Non | Strongest computable (Theorem 2) No strongest (Sect. 4.1) 

N Det | No strongest (Sect. 4.1); subsumed by N-SL | No strongest (Sect. 4.1), but sufficient computable (Theorem 4) 
N Non | No strongest (Sect. 4.1) 1d-affine decidable (Theorem 6); undec. in general (Theorem 5) 
R Det | Strongest: affine relations by Karr [17] Strongest: affine closure on Zariski closure (Theorem 1) 

R Non | Strongest: affine relations by Karr [17] Strongest: affine closure on Zariski closure (Theorem 1) 

R+ | Det | No strongest (Sect. 4.1); subsumed by R;-SL | No strongest, but sufficient computable [8] 

R+ | Non | No strongest (Sect. 4.1) Undecidable [8] 


for any unreachable point target when dealing with deterministic integer 
linear dynamical systems (Theorem 4). 

e However, for nondeterministic integer linear dynamical systems, comput- 
ing an N-semi-linear invariants is an undecidable problem in arbitrary 
dimension (Theorem 5). Nevertheless we show how such invariants can be 
constructed in a low-dimensional setting, in particular for affine updates 
in one dimension (Theorem 6). As an immediate consequence, this estab- 
lishes that the multipath loop associated with the MU Puzzle belongs to a 
class of programs for which we can automatically synthesise N-semi-linear 
invariants. 

~ For full-dimensional® Z-linear targets we show that reachability is decidable, 
and, in the case of unreachability that a Z-semi-linear invariant can always 
be exhibited as a certificate (Theorem 3). If the target is not full-dimensional 
then the reachability problem is Skolem-hard and undecidable for determin- 
istic and nondeterministic systems respectively. 

— In Sect. 6 we present our tool POROUS which handles one-dimensional affine 
systems for both point and Z-linear targets, solving both the reachability 
problem and producing invariants. Inter alia, this allows one to handle the 
multipath loop derived from the MU Puzzle in fully automated manner. 


1.1 Related Work 


The reachability problem (in arbitrary dimension) for loops with a single affine 
update, or equivalently for deterministic linear dynamical systems, is decidable 
in polynomial time for point targets (that is Y = {y}), as shown by Kannan and 
Lipton [16]. However for nondeterministic systems (where the update matrix is 
chosen nondeterministically from a finite set at each time step), reachability is 
undecidable, by reduction from the matrix semigroup membership problem [22]. 

In particular this entails that for unreachable nondeterministic instances we 
cannot hope always to be able to compute a separating invariant. In some cases 


5 The affine span covers the entire space. 
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we may compute the strongest invariant (which may suffice if this invariant 
happens to be separating for the given reachability query), or we may compute 
an invariant in sub-cases for which reachability is decidable (for example in low 
dimensions). For some classes of invariants, it is also undecidable whether an 
invariant exists (e.g., polyhedral invariants [8]). 

Various types of invariants have been studied for linear dynamical systems, 
including polyhedra [8,23], algebraic [15], and o-minimal [1] invariants. For cer- 
tain classes of invariants (e.g., algebraic [15]), it is decidable whether a separating 
invariant exists, notwithstanding the reachability problem being undecidable. 
Other works (e.g., [5]) use heuristic approaches to generate invariants, without 
aiming for any sort of completeness. 

Kincaid, Breck, Cyphert and Reps [18] study loops with linear updates, 
studying the closed forms for the variables to prove safety and termination prop- 
erties. Such closed forms, when expressible in certain arithmetic theories, can be 
interpreted as another type of invariant and can be used to over-approximate the 
reachable sets. The work is restricted to a single update function (deterministic 
loops) and places additional constraints on the updates to bring the closed forms 
into appropriate theories. 

Bozga, Iosif and Konecny’s FLATA tool [2] considers affine functions in arbi- 
trary dimension. However, it is restricted to affine functions with finite monoids; 
in our one-dimensional case this would correspond to limiting oneself to counter- 
like functions of the form f(x) =x + b. 

Finkel, Göller and Haase [9], extending Fremont [10], show that reachability 
in a single dimension is PSPACE-complete for polynomial update functions 
(and allowing states can be used to control the sequences of updates which can 
be applied). The affine functions (and single-state restriction) we consider are a 
special case, but we focus on producing invariants to disprove reachability. 

Other tools, e.g., APROVE [11] and Biichi Automizer [14] may (dis-)prove 
termination/reachability on all branches, but may not be able to prove termi- 
nation/reachability on some branch. 

Inductive invariants specified in Presburger arithmetic have been used to 
disprove reachability in vector addition systems [20]. A generalisation, ‘almost 
semi-linear sets’ [21] are also non-convex and can capture exactly the reachable 
points of vector addition systems. Our nondeterministic linear dynamical sys- 
tems can be seen as vector addition systems over Z extended with affine updates 
(rather than only additive updates). 


2 Preliminaries 


We denote by Z the integers and N the non-negative integers. We say that 
x,y E Z are congruent modulo d € N, denoted x = y mod d, if d divides 
x — y. Given an integer x and natural d we write (x mod d) for the number in 
{0,...,d — 1} such that (x mod d) =x mod d. 
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Definition 1 (Integer Linear Dynamical Systems). A d-dimensional inte- 
ger linear dynamical system (LDS) (a ,{M,,...,Mx}) is defined by an initial 
point 2 € Z? and a set of integer matrices M,,...,M, C Z***. An LDS is 
deterministic if it comprises a single matrix (k = 1) and is otherwise nondeter- 
ministic. 

A point y is reachable if there exists m € N and By,...,Bm such that 
Bı <- Bmx = y and Bi € {M,,...,Mg} for alll <i<m. 

The reachability set O C Z? of an LDS is the set of reachable points. 


Definition 2 (K-(semi)-linear sets). A linear set L is defined by a base vector 
b € Z? and period vectors p1,...,pq E€ ZË such that 


L={b+aipi +-+: + aapa | a1,...,a¢ E K}. 


For convenience we often write {b+ pıK + -+ + paK} for L. A set is semi-linear 
if it is the finite union of linear sets. 


N-semi-linear sets are precisely those definable in Presburger arithmetic 
(FO(Z, +, <)) [12]. However, we can also consider Z-semi-linear sets (correspond- 
ing to FO(Z, +) without order), and the real counterparts (R and R4). Note that 
even if K = N we still allow p; € Z. 


Definition 3. Given an integer linear dynamical system (x), {M;,..., Mg}), 
a set I is an inductive invariant if 


- ¢ € I, and 
- {Mix |x E€ I} CTI for alli € {1,...,k}. 


Note in particular that every inductive invariant contains the reachability set 
(O C I). We are interested in the following problem: 


Definition 4 (Invariant Synthesis Problem). Given an invariant domain 
D, an integer linear dynamical system (x®, { M1, ..., Mk}), and a target Y , does 
there exist an inductive invariant I in D disjoint from Y ? 


In our setting, we are interested in classes D of invariants that are linear, or 
semi-linear. When a separating inductive invariant I exists, we also wish to 
compute it. Since (semi)-linear invariants are enumerable, the decision problem 
is, in theory, sufficient—although all of our proofs are constructive. 


3 R Invariants: R-linear and R-semi-linear 


Before delving into porous invariants, let us consider invariants over the real 
numbers, i.e., described as R-(semi)-linear sets. 

Strongest R-linear invariants are given precisely by the affine hull of the 
reachability set, and can be computed using Karr’s algorithm [17]. Moreover, we 
will show that strongest R-semi-linear invariants also exist and can be computed 
by combining techniques for algebraic invariants [15] and R-linear invariants. 
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R-linear. Recall that a set L is R-linear if L = {vp + vR + +- + uR} for some 
vo, ..., V E€ Z that can be assumed to be linearly-independent® without loss of 
generality (and thus t < d). Given two distinct points of L, every point on the 
infinite line connecting them must also be in L. Generalising this idea to higher 
dimensions, given a set S C R4, let the affine hull be 


k k 
v= SDan kena es NERY AmI, 


i=1 i=l 


Fix an LDS (2 ,{M,,...,M,}) and consider its reachability set O = 
{Min ++ Mj, 2© |m EN, i1... im € {1,..., k}}. Then ©* is precisely the 
strongest R-linear invariant. Karr’s algorithm [17,26] can be used to compute 
this strongest invariant in polynomial time. The next lemma follows from The- 
orem 3.1 of [26]. 


Lemma 1. Given an LDS (x®,{M;,..., Mg}) of dimension d, we can compute 
in time polynomial in d, k, and logu (where u > 0 is an upper bound on the 
absolute values of the integers appearing in «© and Mi,..., Mp), a Q-affinely 
independent set of integer vectors Ro C O such that: 


{; r® = Ro, To a= 
2. the affine span of Ro and the affine span of O are the same (Ro = O°), 
3. the entries of the vectors in Ro have absolute value at most po := (du)?. 


Let Ro = {2 ri, ...,Ta} be obtained as per Lemma 1, with d’ < d. The 
R-linear invariant of the LDS is the affine span Ro, which can be written as the 
R-linear set Lo = {2© + (rı —c©)R +--+ (ra — 2 )R}. 


R-semi-linear. Let us now generalise this approach to R-semi-linear sets. The 
collection of R-semi-linear sets, {Uj Li | m € N, L1,...,L are R-linear sets}, 
is closed under finite unions and arbitrary intersections’. Thus for any given set 
X, the smallest R-semi-linear set containing X is simply the intersection of all 


R-semi-linear sets containing X. Let us denote by X this smallest R-semi-linear 
set. We are interested in O”. 


-=R 
Theorem 1. The strongest R-semi-linear invariant O of O is computable. 


Algebraic sets are those that are definable by finite unions and intersections 
of zeros of polynomials. For example, {(x, y) | cy = 0} describes the lines z = 0 
and y = 0. The (real) Zariski closure X° of a set X is the smallest algebraic 
subset of R? containing the set X. The Zariski closure of the set of reachable 
points, O’, can be computed algorithmically [15]. 
6 v0,..., Um are linearly independent if there does not exist ao,...,@m € R, not all 0, 
such that aovo +:::+@AmUm = 0. 
T When intersecting a linear set with a semi-linear set, either the latter does not 
change, or one obtains a finite union of elements of smaller dimension. Thus, in an 
infinite intersection, only a finite number of intersections affects the original set. 
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An algebraic set A is irreducible if whenever A C BUC, where B and C 
are algebraic sets, then we have A C B or A C C. Any algebraic set (and 
in particular a Zariski closure) can be written effectively as a finite union of 
irreducible sets [3]. 


Proposition 1. Let X° = A, U--- U Ap, with A;’s irreducible. Then on 


=;R —R ——_R —— =ü 
X =A, U- UAkp =A U. UAk. 


a 


Proof. Since A; C x =U,L;, and A; is irreducible, we have A; C L; for some 
j (as the L,’s are algebraic sets). Since L; is R-linear, and A; is the smallest 
R-linear set covering A;, we have Å; C L;. Taking x= A, U.U Ag" is thus 
optimal. 


Thus o can be obtained by computing A; for each irreducible Ai, where 
oO = A; U--- U Ax. To complete the proof of Theorem 1 it remains to confirm 
that affine hulls of algebraic sets can be computed algorithmically. Let us fix 
an algebraic set A, and let W denote a set variable. Proceed as follows. Start 
with W <— {x} for some point x € A, and repeatedly let W — W U fy}, where 
y € A\ W. Such a point y can always be found using quantifier elimination in 
the theory of the reals. Each step necessarily increases the dimension, which can 
occur at most d times, ensuring termination, at which point one has A=W. 


4 Strongest Z-linear Invariants 


Recall that a Zlinear set {q + pi1Z+---+ p,Z} is defined by a base vector 
q € Z and period vectors p1,..., pn € Zt. Equivalently, a Z-linear set describes 
a lattice, i.e., {p1Z + --- + p,Z}, in d-dimensional space, translated to start from 
q rather than 0. 


Theorem 2. Given a d-dimensional dynamical system (a, {My,...,Mz}), 
the strongest Z-linear inductive invariant containing the reachability set O exists 
and can be computed algorithmically. 


The image of a Z-linear set L = {q+ pıZ +--+- + p,Z} by a matrix M is the 
Z-linear set: M(L) = {Mq+(Mp\)Z+---+(Mp,)Z}. The following lemma 
asserts that when two points are in a Z-linear set, the direction between these 
two points can be applied from any reachable point, and hence this direction 
can be included as a period without altering the set. 


Proposition 2. Let L = {q+ aipi +--+ npn | @1,..., an E€ Z} be a Z-linear 
set. If x,y € L then for allz € L and alla’ € Z we have z + (y — x)a' € L. In 
particular, we have L = {q + a1pı +++: + anpn + a'(y — x) | a1, ..., an, a € Z}. 


Proof. If £ =q + a1pı +--+ + app and y = q + bipi +-+- + PnPn then y— z = 


q+ bipi +++: + bnpn — (q+ a1pı +++: + anpn) = (bı — a1)pı + +++ + (bn — an )Pn- 
Then for any z = q + c1pı +-+- + Cnpn, we have z + a'(y — 2) = q + c1pı + 
-+ Cnpn +a'((bı — a1)pı +- + (bn — an)pn) = q + (c1 + a' (bı — a1))pi +++: 


(Cn +a' (bn — an) )pn) where (c; + a' (bi — ai)) € Z, so z +a'(y — x) E€ L. 
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Proposition 3. Given two Z-linear sets Lı = {q + pıZ +--+ PnZ} and Lz = 
{s+tiZ4+---+tmZ}, there exists a smallest Z-linear set L containing Lı U La: 
the set L={qt+(s—q@Z4+mZ4+---+pnpZ4+tZ+---+t,Z}. 


Proof. First we show Ly U Lo C L: 


-— If £ = q+ apı +: +anpn € Lı, then z = q+(s—q)0+aipi +--+ anpn 
Oti +---+ 0t, E L. 
- If x = s + biti +- +bmtm E Lo, then z = q + (s — q) 1 + 0pı +--+ + 0pn + 


biti +--+ bmtm E L. 


Next we show minimality as a straightforward consequence of Proposition 2. 

Clearly the vectors p1,...,Pn can be added by Proposition 2 because any 
two points of Lı differing by p; guarantees that adding p; does not alter the 
resulting set. Similarly, t1, ...,tm can also be included. Finally, by Proposition 2, 
the vector s — q can be included because q and s both belong to Lı U Lə. 


A d-dimensional lattice can always be defined by at most d vectors; and thus 
if d is the dimension of the matrices, no more than d period vectors are needed 
in total. However, Proposition 3 induces a representation which may over-specify 
the lattice by producing more than d vectors to define the lattice. 


Example 2. Consider the lattice {(2,2)Z+ (0,6)Z + (2,6)Z}, specified with 
three vectors, which is equivalent to the lattice {(2,0)Z + (0,2)Z}. Note that one 
may not simply pick an independent subset of the periods, as none of the fol- 
lowing sets are equal: {(2,2)Z + (0,6)Z}, {(2,2)Z + (2,6)Z}, {(0,6)Z + (2, 6)Z}, 
and {(2,2)Z + (0,6)Z + (2,6)Z}. 


The Hermite normal form can be used to obtain a basis of the vectors that 
define the lattice. Consider a lattice L; = {p1Z+---+ paZ}. The lattice remains 
the same if p; is swapped with p,, if p; is replaced by —p;, or if p; is replaced by 
pi + ap; where a is any fixed integer’. 

These are the unimodular operations. The Hermite normal form of a matrix 
M is a matrix H such that M = UH, where U is a unimodular matrix (formed 
by unimodular column operations) and H is lower triangular, non-negative and 
each row has a unique maximum entry which is on the main diagonal. Such 
a form always exists, and the columns of H form a basis of the same lattice 
as the columns of M, because they differ up to unimodular (lattice-preserving) 
operations. There are many texts on the subject; we refer the reader to the 
lecture notes of Shmonin [25] for more detailed explanations. 

The columns of a matrix in Hermite normal form constitute a unique basis for 
the lattice (up to additional redundant zero columns). Hence a basis of minimal 
dimension can be obtained by computing the Hermite normal form of the matrix 
formed by placing the period vectors into columns. 


8 The last replacement is valid, since if £ = y+6p; € L then z = yt B(pitap;)— Bap; 
is in the new lattice. 
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We now prove the main theorem: 


Proof (Proof of Theorem 2). We claim that Algorithm 1 returns the strongest 
Z-linear invariant I. 
Algorithm 1 proceeds in two phases: 


— First find a necessary subset Lo C I of the invariant having already the same 
dimension as T. 

— Then compute a growing sequence Lo G Li © +- © Lm-1 = Lm = I, where 
at each step the algorithm merely increases the density of the attendant sets 
in order to ‘fill in’ missing points of the invariant. 


Recall the set Ro = {2 r1,..., ra } C O, with d' < d, from Lemma 1. The 
resulting Z-linear set Lo = {x + (rı —2©)Z -+ (ry —2)Z} is then a 
d'-dimensional porous subset of the d’-dimensional affine hull of the orbit (Lg C 
o°). Applying Mı, ..., Mp can only increase the density, but not the dimension. 
As each r; and 2) are in O, by Proposition 2 we can assume that each of the 
directions (r; — a‘°)) must be represented in any Z-linear set containing O, and 
we therefore have that Lo C I. 

In the second phase, we ‘fill in’ the lattice as required to cover the whole of 
O. To do this we repeatedly apply the covering procedure of Proposition 3. That 
is, Lj+1 is the smallest Z-linear set covering L; U Mı (Li) U---UM,(L;). To keep 
the number of vectors small, we keep the period vectors of the Z-linear set in 
Hermite normal form. 

The vectors pı = (ry — 2),..., par = (ra — ©) form a parallelepiped 
(hyper-parallelogram) that repeats regularly. There are a finite number of inte- 
gral points inside this parallelepiped. If new points are added in some step, they 
are added to every parallelepiped. Thus we can add new points finitely many 
times before saturating or becoming fixed. The volume of the parallelepiped is 
bounded above by |p|- -+ |pa’|. 

At each step, the volume of the parallelepiped must at least halve, thus the 
volume at step t is vol; < |pi|---|pa|/2'. The procedure must saturate at or 
before the volume becomes 1, which occurs after at most log(|pi|---|pa|) = 
X; log(|p:|) steps. At each step, for efficiency considerations, we convert the 
Z-linear set into Hermite normal form to retain exactly d’ period vectors. 


Claim (I is the strongest invariant). For every invariant J, we have I C J. 


By induction, let us prove that every invariant J must contain L;. Clearly this 
is the case for Lo because all points of Rg C O must be in J and every period 
vectors in Lo can be present, without loss of generality, thanks to Proposition 
2. Assume L; C J. Then it must be the case that J contains every M;(L;), as 
otherwise it would not be an invariant. It therefore follows that J must contain 
Li+ı, since the latter is the minimal Z-linear set containing L; and M;(L,) for 
all j < k. Finally, since J is itself one of the L;’s, we have I C J as required. 


Remark 1. Note that a Z-linear set is not sufficient for the MU puzzle: both 1 
and 2 are in the reachability set, thus {1 + 1Z} = Z is the strongest Z-linear 
invariant. 
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Algorithm 1: Strongest Z-linear invariant for LDS (2), Mı, ..., Mz) 
Input: ec Mi, ..., Mk 
Compute Ro = yes side ra} CO 
Compute p; = ri — «© for i € {1,...,d’} 
Lo = {2 +piZ+-::- + paZ} 
while True do 
L; = Covering(Li—ı U Mı (Li-1) EERE] Mpg(Li—1)) 
H; = HermiteNormalForm(L;) 
L= {2 +Z +- +hgZ | hy column of Hy} 
if Li = Li—ı then 
| return L; 
end 
end 


4.1 Extensions of Z-linear Sets Without Strongest Invariants 


In this section we show that several generalisations of Z-linear domains fail to 
admit strongest invariants. 

Z-semi-linear sets are unions of Z-linear sets, and therefore can include sin- 
gletons. Consider the deterministic dynamical system starting from point 1 and 
doubling at each step M = (1,(x + 2x)). This system has reachability set 
O = {2} | k € N}, which is not even N-semi-linear (our most general class). For 
this LDS we can construct the invariant {2,4, Sse oy U ar ea | pi € Z} for 
each k. For any proposed strongest Z-semi-linear invariant, one can find a k for 
which the corresponding invariant is an improvement. 

N-linear sets generalise Z-linear sets (observe that Z-linear sets are a proper 
subclass, since {x + p;Z} can be expressed as {x + (—p;)N + piN}, but {£ + p;N} 
is clearly not Z-linear). Consider the LDS ((z1, £2), (9 4)), with a reachability 
set consisting of just two points z = (x1, £2) and y = (x2,21). There are two 
incomparable candidates for the minimal N-linear invariant: {x + (y — 7)N} and 
{y + (x — y)N}. Similarly for R;-linear invariants, the sets {y + (x — y)R;} and 
{x + (y — x)R+} are incomparable half-lines. 


4.2 Z-linear Targets 


We have so far only considered invariants for point targets. We now turn to 
lattice-like targets, in particular targets specified as full-dimensional Z-linear 
sets. 


Theorem 3. It is decidable whether a given LDS (a, {M,,...,M,}) reaches 
a full-dimensional Z-linear target Y = {x£ + p,Z+--++ paZ}, with x,p; € Zt. 

Furthermore, for unreachable instances, a Z-semi-linear inductive invariant 
can be provided. 
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Theorem 3 requires the targets to be full-dimensional. For nondeterministic 
systems reachability is undecidable for non-full-dimensional targets (in particu- 
lar point targets) [22]. However, even for deterministic systems, when Z-linear 
targets fail to be full-dimensional the reachability problem becomes as hard as 
the Skolem problem (see, e.g. [24]), for example by choosing as target the set 
{(0,22,...,@a) | @2,-..,0a € Z} = {0 + e22 +--+ eaZ}, where e; € {0,1} is 
the standard basis vector, with (e;); = 1 and (e;); = 0 for i Æ j. 

Towards proving Theorem 3, we first show that full-dimensional linear sets 
can be expressed as ‘square’ hybrid-linear sets. Hybrid-linear sets are semi-linear 
sets in which all the components share the same period vectors, and thus differ 
only in starting position (whereas semi-linear sets allow each component to have 
distinct period vectors). By square, we mean that all period vectors are the same 
multiple of standard basis vectors. 


Lemma 2. Let Y = {+ p1:Z+---+ paZ} be a full-dimensional Z-linear set. 
Then there exists m € N and a finite set B C [0,m— 1]? such that Y = 
Uses {b +me,Z+---+ meqZ}. 


Pı 
Proof. Suppose pı,...,pa span a d-dimensional vector space. Let P = ( : 


Pd 
be the matrix with rows p1,...,pq. Since P is full row rank it is invertible, 


hence there exists a rational matrix P~! such that e; = PL +e Pipa 
In particular let m; be such that Proms is integral for all 7. Then there is an 
integral combination of pı, ... , Pa such that m;e; is an admissible direction in Y. 

Let m = lem {m1,..., ma}. Then me; is an admissible direction in Y. Hence 
by Proposition 2, Y is equivalent to {x + pıZ +--+ + paZ + meZ + ---4 
meaZ y}. By the presence of meZ + --- + MmeqaZ we have that x € Y if and 
only x’ € Y where z; = (x; mod m). 

And therefore Y can be written as (Jpeg {b + MeZ +--+ + megZ}, where 
B = [0,m-— 1] AY. 


We now prove Theorem 3. 


Proof (Proof of Theorem 3). Choose m and B as in Lemma 2, so that Y is of 
the form U,eg {b + me1Z +--+ meZ}. We build an invariant J of the form 
Usep {b + meZ +--+ + megZ} for some B’ C [0,m — 1]°. 

We initialise the set Ip = {x +me,Z+---+megZ}, where x € [0,m — 1]? 
such that 7; = C mod m). We then build the set Jı by adding to To the sets 
{y + me,Z + -- -+ megZ} where for each choice of M;, y € [0, m — 1] is formed 
by y; = ((Mix); mod m) for some x € Ip. We iterate this construction until it 
stabilises in an inductive invariant J. Termination follows from the finiteness of 
(0, m — 1]? (noting in particular that if termination occurs with B’ = [0,m—1]¢, 
then I = Z4 which is indeed an inductive invariant). 
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If there exists y € BN I then return REACHABLE. This is because the same 
sequence of matrices applied to x to produce y € I would, thanks to the 
modulo step, wind up inside the set {y + meZ +--+ + megZ}, which is a part 
of the target. 

Otherwise, return UNREACHABLE and J as invariant. By construction, I is 
indeed an inductive invariant disjoint from the target set. 


Remark 2. By the same argument, Theorem 3 extends to a restricted class of 
Z-serni-linear targets: the finite union of full-dimensional Z-linear sets. 


5 N-Semi-linear Invariants 


We now consider N-semi-linear invariants, our most general class. N-semi-linear 
invariants gain expressivity thanks to the ‘directions’ provided by the period 
vectors. For example, the only possible Z-semi-linear invariant for the LDS 
(0,(a + x + 1)) is Z, yet the reachability set, N, is captured exactly by an N- 
linear invariant. We show that a separating N-semi-linear invariant can always 
be found for unreachable instances of deterministic integer LDS, although the 
computed invariant will depend on the target. However, finding invariants is 
undecidable for nondeterministic systems, at least in high dimension. Neverthe- 
less, we show decidability for the low-dimensional setting of the MU Puzzle—one 
dimension with affine updates. 


5.1 Existence of Sufficient (but Non-minimal) N-semi-linear 
Invariants for Point Reachability in Deterministic LDS 


Kannan and Lipton showed decidability of reachability of a point target for 
deterministic LDS [16]. In this subsection, we establish the following result to 
provide a separating invariant in unreachability instances. 


Theorem 4. Given a deterministic LDS (a, M) together with a point target 
y, if the target is unreachable then a separating N-semi-linear inductive invariant 
can be provided. 


To do so, we will invoke the results from [8] to compute an R4-semi-linear 
inductive invariant, and then extract from it an N-semi-linear inductive invari- 
ant. More precisely, the authors of [8] show how to build polytopic inductive 
invariants for certain deterministic LDS. Such polytopes are either bounded or 
are R,-semi-linear sets. In the first case, the polytope contains only finitely 
many integral points, which can directly be represented via an N-semi-linear set. 
In the second case, we build an N-semi-linear set containing exactly the set of 
integral points included in the R,-semi-linear invariant, thanks to the following 
lemma. 


Lemma 3. Given an R,-linear set S = {x +}; piR+}, where the vectors pi 
have rational coefficients and x is an integer vector, one can build an N-semi- 
linear set N comprising precisely all of the integral points of S. 
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Proof (Proof of Theorem 4). We note that every invariant produced in [8] has 
rational period vectors, as the vectors are given by the difference of successive 
point in the orbit of the system, and thus Lemma 3 can be applied. The authors 
of [8] build an inductive invariant in all cases except those for which every eigen- 
value of the matrix governing the evolution of the LDS is either 0 or of modulus 
1 and at least one of the latter is not a root of unity. This situation however 
cannot occur in our setting. Indeed, the eigenvalues of an integer matrix are alge- 
braic integers, and an old result of Kronecker [19] asserts that unless all of the 
eigenvalues are roots of unity, one of them must have modulus strictly greater 
than 1 (the case in which all eigenvalues are 0 being of course trivial). 
This concludes the proof of Theorem 4. 


5.2 Undecidability of N-semi-linear Invariants for Nondeterministic 
LDS 


If the enhanced expressivity of N-semi-linear sets allows us always to find an 
invariant for deterministic LDS, it contributes in turn to making the invariant- 
synthesis problem undecidable when the LDS is not deterministic. We establish 
this through a reduction from the infinite Post correspondence problem (w-PCP) 
that can be defined in the following way: given m pairs of non-empty words 
{(ul,vt),...,(u™,v™)} over alphabet {0,2}, does there exist an infinite word 
w = wwz... over alphabet {1,...,m} such that u”u™?... = vtu? .... This 
problem is known to be undecidable when m is at least 8 [6,13]. 


Theorem 5. The invariant synthesis problem for N-semi-linear sets and linear 
dynamical systems with at least two matrices of size 91 is undecidable. 


Proof (Sketch). We first establish the result in the case of several matrices in 
low dimension; this can then be transformed in a standard way to two larger 
matrices (of size 91). 

The proof is by reduction from the infinite Post correspondence problem. 
Given an instance of this problem the pair of words corresponding to each 
sequence of tiles has an integer representation, using base-4 encoding. An impor- 
tant property of our encoding is that the operation of appending a new tile to 
an existing pair of words can be encoded by matrix multiplication. 

Recall that if the instance of w-PCP is negative, then every generated pair 
of words will differ at some point. Our encoding is such that this difference of 
letters creates a difference in their numerical encodings that can be identified 
with an N-semi-linear invariant. On the other hand, when there is a positive 
answer to the w-PCP instance, there can be no N-semi-linear invariant. 


5.3 Nondeterministic One-Dimensional Affine Updates 


The previous section shows that point reachability for nondeterministic LDS 
is undecidable once there sufficiently many dimensions, motivating an analysis 
at lower dimensions. The MU Puzzle requires a single dimension with affine 
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updates (or equivalently two dimensions in matrix representation, with the 
coordinate along the second dimension kept constant). We consider this one- 
dimensional affine-update case, and therefore, rather than taking matrices as 
input, we directly work with affine functions of the form f;(a) = aix + bi. 


Theorem 6. Given x,y € Z, along with a finite set of functions {f,,..., fr} 
where f(x) = aix + bi, ai,bi E€ Z for 1 <i < k, it is decidable whether y is 
reachable from x), 

Moreover, when y is unreachable, an N-semi-linear separating inductive 
invariant can be algorithmically computed. 


We note that decidability of reachability is already known [9,10]. We refine 
this result by exhibiting an invariant which can be used to disprove reachability. 
In fact our procedure will produce an N-semi-linear set which can be used to 
decide reachability, and which, in instances of non-reachability, will be a sep- 
arating inductive invariant. We have implemented this algorithm into our tool 
POROUS, enabling us to efficiently tackle the MU Puzzle as well as its generali- 
sation to arbitrary collections of one-dimensional affine functions. We report on 
our experiments in Sect. 6. 

We build a case distinction depending on the type of functions that appear: 


Definition 5. A function f(x) = ax +b... 


— ... is redundant if f(x) = b, (including possibly b = 0), or if f(a) = zx. 

— ... is counter-like if f(x) = x+ b, b#0. Two counter-like functions, f(x) = 
x +b and g(x) = x +c are opposing ifb > 0 and c < 0 (or vice-versa). 

- ... is growing if f(x) = ax +b and |a| > 2. We say a growing function is 
inverting if a < —2. 

— .,. is pure inverting if f(x) = —a+4 b. 


Simplifying Assumptions 


Lemma 4. Without loss of generality, redundant functions are redundant; more 
precisely, we can reduce the computation of an invariant for a system having 
redundant functions to finitely many invariant computations for systems devoid 
of such functions. 


Proof. Clearly the identity function has no impact on the reachability set, and 
so can be removed outright. For any other redundant function, its impact on 
the reachability set does not depend on when the function is used, and we may 
therefore assume that it was used in the first step, or equivalently, using an alter- 
native starting point. Hence the invariant-computation problem can be reduced 
to finitely many instances of the problem over different starting points, with 
redundant functions removed. Finally, taking the union of the resulting invari- 
ants yields an invariant for the original system. 


Lemma 5. Without loss of generality, x© > 0. 
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Proof. We construct a new system, where each transition f(x) = ax + b is 
replaced by f(x) = ax —b. Then x reaches y in the original system if and only 
if —x®) reaches —y in the new system. To see this, observe that if f(x) = az +b 
then f(—x) = —ax — b = —f(z). 


Lemma 6. Suppose there are at least two distinct pure inverting functions (and 
possibly other types of functions). Then without loss of generality there are two 
opposing counters. 


Proof. Consider f(x) = —x + b, and g(x) = —x + c. Then f(g(x)) = —(—x + 
c)+b=2+6—cand g(f(x)) = —(—x +b) +c = xr +c—b. Since b—c = —(c—d) 
and b Æ c (as f # g) these two functions are opposing. 


Two Opposing Counters. Let us first observe that when there are two oppos- 
ing counters, we essentially move in either direction by some fixed amount. This 
will entail that only Z-(semi)-linear invariants can be produced, rather than 
proper N-(semi)-linear invariants. 


Lemma 7. Suppose there are two opposing counters, f(x) = a+b, and g(a) = 
x—c. Then for any reachable x we have {x + dZ} CI for d= gcd(b, c). 


Therefore, starting with { (0) + dZ} € I we can ‘saturate’ the invariant 
under construction using the following lemma: 


Lemma 8. Let h(x) = x +d be chosen as a reference counter amongst the 
counters. If {x + dZ} € I, then { f(x) +dZ} € I for every function f. 


Proof (Proof of Lemma 8). Consider the function f(x) = ax+b. If x = y+dk € I, 
then f(x) = ax +b = ay + adk + b = f(y) + adk € I. 

Now thanks to the presence of counter h(x) = «+ d, by choosing the initial 
k € Z appropriately and applying h(x) sufficiently many times (say m € N 
times), one can reach f(x) + adk + dm = f(x) + dn for any desired n € Z. 


Without loss of generality if {x + dZ} is in the invariant, then 0 < x < d. 
We then repeatedly use Lemma 8 to find the required elements of the invariant. 
Since there are only finitely many residue classes (modulo d), every reachable 
residue class {c1,. . . , Cn } can be found by saturation (in at most d steps), yielding 
invariant {c1 + dZ} U -++ U {cn + dZ}. 

Thanks to Lemma 6, in all remaining cases there is without loss of generality 
at most one pure inverter. 


Only Pure Inverters. If there is exactly one pure inverter f(x) = —x +b (and 
no other types of functions), then f(x) = —2 +b and f(—a# +b) = a —b+ 
b = ©), thus the reachability set is finite, with exact invariant Lah), =r) 4 b}. 
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No Counters. If we are not in the preceding case and there are no counters, then 
there must be growing functions and by Lemma 6, without loss of generality at 
most one pure inverter. We show that all growing functions increase the modulus 
outside of some bounded region. 


Lemma 9. For every M > 0 and every growing function f(x) = ax+b, |a| > 2, 
there exists ors > 0 such that if |x| > cH then |f(x)| > |x| + M. 


Proof. By the triangle inequality we have: | f(x)| = |ax + b| > |a||z| — |b|. Thus 


b|+|M 
je] > E => [allel — lb] > lel + |M] = |f(@)| = le] +M. 


This is the only situation in which the invariant is not exactly the reachability 
set, and requires us to take an overapproximation. 


Let C = max e? TOP Ch» lyl + I} for fi,-.-, fẹ growing functions. If 


there are no pure inverters then {—C — N} U {C +N} is invariant (although 
may not yet contain the whole of ©). However, we can return the inductive 
invariant {-C —N} U{C +N} U(ON (—C,C)). The set OM (—C,C) is finite 
and can elicited by exhaustive search, noting that once an element of the orbit 
reaches absolute value at least C, the remainder of the corresponding trajectory 
remains forever outside of (—C,C). 

If there is one pure inverter g(x) = —x + d then observe that —C is mapped 
to C + d and C + d is mapped to —C. Thus intuitively we want to use the 
interval (—C,C + d). However two problems may occur: (a) since d could be 
less than 0 then C + d may no longer be growing (under the application of the 
growing functions), and (b) an inverting growing function only ensures that —C is 
mapped to a value greater than or equal to C, rather than C'+d. Hence, we choose 
C’ to ensure that C’ + d is still growing by at least |d| (under the application 


of our growing functions). Let C’ = max Rene ay ce ly| + 1} + |d|. Then the 
invariant is {—C’ — N} U{C’ +d+N}U (On (-C’,C" + d)). 


Non-opposing Counters. The only remaining possibility (if there do not exist 
two opposing counters, and not all functions are growing or pure inverters), 
is that there are counter-like functions, but they are all counting in the same 
direction. There may also be a single pure inverter, and possibly some growing 
functions. 

Pick a counter h(a) = «+d to be the reference counter; the choice is arbitrary, 
but it is convenient to pick a counter with minimal |d|. As a starting point, we 
have {x + dN} C 7. 


Lemma 10. If there is an inverter g(x) = —ax +b, witha > 0,b € Z, and we 
have {x + dN} C I then {g(x) + dZ} CI. 


The crucial difference with Lemma 8 is the observation that now an N-linear set 
has induced a Z-linear set. 
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Proof. Let r = g(x) + dm for m € Z. We show r € I. Consider x + dn for 
n E€ N, then g(x + dn) = —a(x+ dn) +b = —ax + b — adn = g(x) — adn. Hence 
g(a) — adn + dk, n,k € N, is reachable by applying k times the function h(x). 
Hence for any m € Z there exists k,n € N such that k — na = m, so that r is 
indeed reachable. 


Similarly to the situation with two opposing counters, whenever the invariant 
contains some Z-linear set, Lemma 8 allows us to saturate amongst the finitely 
many reachable residue classes. 

However, the invariant may contain subsets that are not Z-linear. Consider 
{x + dN} C I, which is not yet invariant. We repeatedly apply non-inverting 
functions to {x + dN} to obtain new N-linear sets (not Z-linear sets). When 
the function applied ‘moves’ in the direction of the counters this will ultimately 
saturate (in particular when applying other counter functions). However, in the 
opposite direction, we may generate infinitely many such classes. 


Example 3. Consider the reference counter h(x) = «+4, with initial point 5. This 
yields an initial set {5 + 4N} C O, where 5 is the initial point and 4N is derived 
from the counter increment. Now when applying x + 22+ 6 to {5+ 4N} we 
obtain {10 +6 + 8N + 4N} = {16 + 4N}, then {38 + 4N}, and then {82 + 4N}. 
However {82 + 4N} C {38+ 4N} and we can therefore stop with the invariant 
{5 + 4N} U {16 + 4N} U {38 + 4N}. 

However, if the initial sequence is not moving in the direction of the reference 
counter, this saturation does not occur. Consider {5 + 4N} with the function 
xt 2r — 6. Then {5+ 4N} maps to {10 — 6 + 8N + 4N} = {4 + 4N}, which 
maps to {2 + 4N}, {-2+ 4N}, {—10 + 4N}, {—26 + 4N}, and so on. However 
—2 and —10 are both 2 modulo 4 (and so is —26 as well). This means in the 
negative direction we can obtain arbitrarily large negative values congruent to 
2 modulo 4 and then use the reference counter h(x) = x + 4 to obtain any value 
of {2 + 4Z}. 


Clearly we can examine all reachable residue classes defined by our reference 
counter. Any residue class reachable after an inverting function induces a Z-linear 
set. So it remains to consider those N-linear sets reachable without inverting 
functions. The remaining case to handle occurs when we repeatedly induce N- 
linear sets until they repeat a residue class in the direction opposite to that of 
the reference counter. 

We consider the case for h(x) = «+d with d > 0. The case with h(x) = x- d 
is symmetric. It remains to detect when a set {x + dN} leads to {y + dN} by 
a sequence of non-inverting functions with x = y mod d. Then by repeated 
application of these functions one can reach sets {z + dN} with z arbitrarily 
small, hence we can replace {x + dN} by {x + dZ}. We give further details in 
the full version. 


Reachability. The above procedure is sufficient to decide reachability. In all 
cases apart from that in which there are no counters, the invariants produced 
coincide precisely with the reachability sets. A reachability query therefore 
reduces to asking whether the target belongs to the invariant. 


190 E. Lefaucheux et al. 


In the remaining case, the invariant obtained is parametrised by the target via 
the bound C”. The target lies within the region (—C’, C’+d), within which we can 
compute all reachable points. Thus once again, the target is reachable precisely 
if it belongs to the invariant. However, for a new target of larger modulus, a 
different invariant would need to be built. 


Complexity 


Lemma 11. Assume that all functions, starting point, and target point are given 
in unary. Then the invariant can be computed in polynomial time. 


Without the unary assumption, the invariant could have exponential size, 
and hence require at least exponential time to compute. That is because the 
invariant we construct could include every value in an interval, for example, 
(—C,C), where C is of size polynomial in the largest value. 

As shown in [10], the reachability problem is at least NP-hard in binary, 
because one can encode the integer Knapsack problem (which allows an object to 
be picked multiple times rather at most once). Moreover the Knapsack problem 
is efficiently solvable in pseudo-polynomial time via dynamic programming; that 
is, polynomial time assuming the input is in unary, matching the complexity of 
our procedure. 


6 The POROUS Tool 


Our invariant-synthesis tool POROUS? computes N-semi-linear invariants for 
point and Z-linear targets on systems defined by one-dimensional affine func- 
tions. POROUS includes implementations of the procedures of Theorem 3 
(restricted to one-dimensional affine systems) and Theorem 6. POROUS is built 
in Python and can be used by command-line file input, a web interface, or by 
directly invoking the Python packages. 

POROUS takes as input an instance (a start point, a target, and a collection of 
functions) and returns the generated invariant. Additionally it provides a proof 
that this set is indeed an inductive invariant: the invariant is a union of N-linear 
sets, so for each linear set and each function, POROUS illustrates the application 
of that function to the linear set and shows for which other linear set in the 
invariant this is a subset. Using this invariant, POROUS can decide reachability; if 
the specific target is reachable the invariant is not in itself a proof of reachability 
(since the invariant will often be an overapproximation of the global reachability 
set). Rather, equipped with the guarantee of reachability, POROUS searches for 
a direct proof of reachability: a sequence of functions from start to target (a 
process which would not otherwise be guaranteed to terminate). 


? Tool: invariants.davidpurser.net Code: github.com/davidjpurser /porous-tool. 
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Table 2. Results varying by size parameter (last row includes all instances tested). 
Times are given in seconds, with the average and maximum shown (except reachability 
proof time, which are all approximately 30s due to instances that terminate just before 
the timeout). 


Size Invariant Unreachable | Invariant Reachable Reachable Reachability 
build time instances proof time _| instances with proofs | proof time 
Avg |Max Avg |Max Within ~30s | Avg 


8 0.001|0.009 |100 (9.84%) |0.005/0.261 |916 (90.2%) |911 (99.5%) | 0.033 
16 0.001|0.020 | 122 (12.0%) |0.010/0.788 |894 (88.0%) |885 (99.0%) | 0.053 
32 | 0.003|0.068 | 134 (13.2%) |0.020|0.911 | 882 (86.8%) | 843 (95.6%) |0.203 
64 |0.008|0.261 | 150 (14.8%) |0.052|2.969 | 866 (85.2%) |766 (88.5%) |0.294 
128 0.021|0.557 |153 (15.1%) |0.096|2.426 |863 (84.9%) |719 (83.3%) | 0.464 
256 0.088|2.838 |166 (16.3%) | 0.316 | 43.587 | 850 (83.7%) |620 (72.9%) |0.998 
512 0.428/9.312 |162 (15.9%) |0.899] 21.127 | 854 (84.1%) |570 (66.7%) | 1.120 
1024 1.121 | 20.252 |173 (17.0%) |3.275|65.397 |843 (83.0%) |514 (61.0%) | 1.646 


All | 0.209 | 20.252 | 1160 (14.3%) | 0.584 | 65.397 | 6968 (85.7%) | 5828 (83.6%) | 0.499 


Experimentation. POROUS was tested on all 27 — 1 possible combinations of 
the following function types, with a > 2,b > 1: positive counters (x +> x + b), 
negative counters (a ++ x — b), growing (x +> ax + b), inverting and growing 
(x + —azx + b), inverters with positive counters (x + —ax + b), inverters with 
negative counters (x ++ —a — b) and the pure inverter (x ++ —2). For each such 
combination a random instance was generated, with a size parameter to control 
the maximum modulus of a and b, ranging between 8 and 1024. The starting 
point was between 1 and the size parameter and the target was between 1 and 4 
times the size parameter. Ten instances were tested for each size parameter and 
each of the 2’ — 1 combinations, with between 1 and 9 functions of each type 
(with a bias for one of each function type). 

Our analysis, summarised in Table 2, illustrates the effect of the size param- 
eter. The time to produce the proof of invariant is separated from the process of 
building the invariant, since producing the proof of invariant can become slower 
as |I| becomes larger; it requires finding Ly € I such that f;(L;) C Lp for every 
linear set Lj € J and every affine function fi. In every case POROUS successfully 
built the invariant, and hence decided reachability very quickly (on average well 
below 1s) and also produced the proof of invariance in around half a second on 
average. To demonstrate correctness in instances for which the target is reachable 
POROUS also attempts to produce a proof of reachability (a sequence of functions 
from start to target). Since our paper is focused on invariants as certificates of 
non-reachability, our proof-of-reachability procedure was implemented crudely 
as a simple breadth-first search without any heuristics, and hence a timeout of 
30s was used for this part of the experiment only. 
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Our experimental methodology was partially limited due to the high preva- 
lence of reachable instances. A random instance will likely exhibit a large (often 
universal) reachability set. When two random counters are included, the chance 
that gcd(b1, b2) = 1 (whence the whole space is covered) is around 60.8% and 
higher if more counters are chosen. 

Overall around 86% of instances were reachable (of which 84% produced a 
proof within 30s). Of the 14% of unreachable instances, all produced a proof, 
with the invariant taking around 0.2s to build and 0.6s to produce the proof. The 
30-s timeout when demonstrating reachability directly is several orders of mag- 
nitudes longer than answering the reachability query via our invariant-building 
method. 

A typical academic/consumer laptop was used to conduct the timing and 
analysis (a four-year-old, four-core MacBook Pro). 


7 Conclusions and Open Directions 


We introduced the notion of porous invariants, which are not necessarily convex 
and can in fact exhibit infinitely many ‘holes’, and studied these in the context 
of multipath (or branching/nondeterministic) affine loops over the integers, or 
equivalently nondeterministic integer linear dynamical systems. We have in par- 
ticular focused on reachability questions. Clearly, the potential applicability of 
porous invariants to larger classes of systems (such as programs involving nested 
loops) or more complex specifications remains largely unexplored. 

Our focus is on the boundary between decidability and undecidability, leav- 
ing precise complexity questions open. Indeed, the complexity of synthesising 
invariants could conceivably be quite high, except where we have highlighted 
polynomial-time results. On the other hand, the invariants produced should be 
easy to understand and manipulate, from both a human and machine perspective. 

On a more technical level, in our setting the most general class of invariants 
that we consider are N-semi-linear. There remains at present a large gap between 
decidability for one-dimensional affine functions, and undecidability for linear 
updates in dimension 91 and above. It would be interesting to investigate whether 
decidability can be extended further, for example to dimensions 2 and 3. 
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Abstract. Satisfiability Modulo Theories (SMT) is an enabling technology 
with many applications, especially in computer-aided verification. Due to 
advances in research and strong demand for solvers, there are many SMT 
solvers available. Since different implementations have different strengths, 
it is often desirable to be able to substitute one solver by another. Un- 
fortunately, the solvers have vastly different APIs and it is not easy to 
switch to a different solver (lock-in effect). To tackle this problem, we 
developed JavaSMT, which is a solver-independent framework that unifies 
the API for using a set of SMT solvers. This paper describes version 3 
of JavaSMT, which now supports eight SMT solvers and offers a simpler 
build and update process. Our feature comparisons and experiments show 
that different SMT solvers significantly differ in terms of feature support 
and performance characteristics. A unifying Java API for SMT solvers is 
important to make the SMT technology accessible for software developers. 
Similar APIs exist for other programming languages. 


Keywords: Satisfiability Modulo Theories - SMT Solver - Java - API 


1 Introduction 


SMT solvers [6, 21] are used in a multitude of applications, e.g., in formal software 
analysis, where automated test-case generation [7, 16, 29,30], SMT-based algo- 
rithms for software verification [10,34], and interactive theorem proving [27, 44] 
are used. Applications and users rely on efficiency and expressiveness (sup- 
ported SMT theories) to compute reasonable results in time. For application 
developers, the usability and API of the solver are also important aspects, and 
some features needed in applications, such as interpolation or optimization, 
are not available in some solvers. 

Using the solver’s own API directly makes it difficult to switch to another 
solver without rewriting extensive parts of the application, as there is no stan- 
dardized binary API for SMT solvers. The SMT-LIB2 standard [4] improves 
this issue by defining a common language to interact with SMT solvers. How- 
ever, this communication channel does not define a solver interface for special 
features like optimization or interpolation.! Additionally, the application has to 
parse the data provided by the SMT solver on its own, and this of course 
slightly changes from solver to solver. 


1 A proposal for adding interpolation queries exists since 2012, see https://ultimate. 
informatik.uni-freiburg.de/smtinterpol/proposal.pdf . 
© The Author(s) 2021 
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JAvASMT [37] provides a common API layer across multiple back-end solvers 
to address these problems. Our Java-based approach creates only minimal over- 
head, while giving access to most solver features. JAVASMT is available under 
the Apache 2.0 License on GitHub.’ 


Contribution. Our contribution consists of three parts: 


e We integrated more SMT solvers into the API framework JavaSMT (new: 
BOOLECTOR [43], CVC4 [5], and Yicrs2 [25]). 

e We simplified the steps to get started using JAVASMT, by including support 
for more operating systems (new: MacOS and Windows) and more build 
techniques (new: ANT and Maven). 

e We evaluated the performance of several algorithms for software verification 
to show that different SMT solvers have different strengths. 


Outline. This paper first provides a brief overview of JAVASMT in Sect. 2, ex- 
plaining the inner structure and features. Sect. 3 discusses the development since 
the previous publication [37]: more integrated SMT solvers and extended support 
for operating systems and build processes. Sect. 4 describes a case study, based 
on SMT-based algorithms [10] in a common verification framework. 


Related Work. SMT-LIB2 [4] is the established standard format for exchanging 
SMT queries. It provides simple usage, is easy to debug, and widely known in 
the community. However, it requires extra effort to parse and transform formulas 
in the user application. Features like optimization, interpolation, and receiving 
nested parts of formulas are not defined by the standard, such that some SMT 
solvers provide their own individual solution for that. Alternatively, several SMT 
solvers already come with their special bindings for some programming languages. 
Most SMT solvers are written in C/C++, so interacting with them in these 
low-level languages is the easiest way. However, the support for higher-level 
languages is sparse. The most prominent language binding for several SMT 
solvers is Python, as it directly allows the access to C code and avoids automated 
memory management operations like asynchronous garbage collection. Bindings 
for Java are available for some SMT solvers, such as MATHSATS5 and Z3, but 
missing, unsupported, or unmaintained for others, such as BooLEcToR and CVC4. 

In the following, we discuss libraries, similar to JAVASMT, that provide access 
to several underlying SMT solvers via a common user interface in different popular 
languages, and their binding mechanism, i.e., whether the solver interaction is 
based on a native interface or text-based on SMT-LIB2. With SMT-LIB2, an ar- 
bitrary SMT solver can be queried, but the interaction happens through communi- 
cating processes and the solver is mostly limited to features defined in the standard. 
Accessing a native interface directly allows to support more features of the under- 
lying solver, e.g., using callbacks, simplifying formulas, or eliminating quantifiers. 

Table 1 provides an overview of the libraries for interacting with SMT solvers. 
We enumerate several special features that are not available in some libraries, 


? https: //github.com/sosy-lab/java-smt 
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Table 1: Comparison of different interface libraries for SMT solvers 


n e 
z g £ & a E F g E 
N + + Lm OP 

S Y 8/8 § g eS] > 5 PÉ 

e § 24/58 6 A/a & ÈS 

JavaSMT [B37] Jaa VY XIS 4 Vv Vv 22 90 2021 
PySMT [28] Python JV V|V¥ Vv Vv x 99 363 2021 
SMT Kır C/C Y XJS XKX x 4 36 2014 
SMT-SWITCH [38] C/C V XL L4 KX "A 15 40 2021 
JSMTLIB [20] Java X Jis X KX x 15 21 2020 
METASMT [45] C/C Xx Si|X XK Vo Jo 19 43 2016 
RSMT2 Rust X Lss X X X 10 24 2021 
SBV Haskell X V|VY X Vv x 17 134 2021 
ScaLa SMT-LIB Scala Xx Si|4% xX KX y 18 44 2021 
ScaLASMT [17] Scala X S|X xX KX J 1 4 2019 
WHAT4 Haskell X Jis X X x 5 97 2021 


such as unsat cores, interpolation, or optimization queries. Those features depend 
on the support by the underlying SMT solver, but can be provided in general 
by an API on top of them. Most libraries use their own formula representation 
and not just wrap the objects provided by the SMT solver. This potentially 
allows for easier formula decomposition and inspection, e.g., by using the visitor 
pattern. JAVASMT directly provides formula decomposition if available in the 
SMT solver. The provided numbers of forks and stars of the project repositories 
on GitHub or Bitbucket can be seen as a measurement of popularity. 

PySMT [28] is a Python-based project and aims at rapid prototyping of 
algorithms using the native API of the installed SMT solvers. It has the ability to 
perform formula manipulation without a back-end SMT solver and additionally 
supports the conversion of boolean formulas to plain SAT problems and then 
apply a SAT solver or a BDD library. This approach comes with the drawback 
of a noticeable memory overhead and performance of an interpreted language. 
METASMT [45], SMT Kır, and Smr-Swircu [38] provide solver-agnostic APIs for 
interacting with various SMT solvers in C/C++ to focus on the application instead 
of the solver integration. sSMTLIB [20], Scara SMT-LIB, and ScaLaSMT [17] are 
solver-independent libraries written in Java or Scala and interact via SMT-LIB2 
with SMT solvers. ScaLa SMT-LIB and ScALASMT allow to use an additional 
domain-specific language to interact with SMT solvers and rewrite Scala syntax 
into valid SMT-LIB2 and back. Both partially extend the SMT-LIB2 standard, 
e.g., by offering the ability to overload operators or receive interpolants. SBV 
and wuar4 are generic Haskell libraries based on process interaction via SMT- 
LIB2 and support several SAT and SMT solvers. RsMT2 offers a generic Rust 
library that currently supports three SMT solvers. 
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2 JavaSMT’s Architecture and Solver Integration 


In the following, we describe the architecture of JAVASMT and its main con- 
cepts. Afterwards, we give an overview of the integrated SMT solvers and their 
features. The architecture did not significantly change, but we added a few 
new SMT solvers, as shown in Fig. 1. 


Architecture. JavaSMT provides a common API for various SMT solvers. The 
architecture, shown in Fig. 1, consists of several components: As common context, 
we use a SolverContext that loads the underlying SMT solver and defines the 
scope and lifetime of all created objects. As long as the context is available, 
we track memory regions of native SMT-solver libraries. When the context is 
closed, the corresponding memory is freed and garbage collection wipes all unused 
objects. Within a given context, JavaSMT provides FormulaManagers for creating 
formulas in various theories and ProverEnvironments for solving SMT queries. 

A FormulaManager allows to create symbols and formulas in the correspond- 
ing theories and provides a type-safe way to combine symbols and formulas 
in order to encode a more complex SMT query. We support the structural 
analysis (like splitting a formula into its components or counting all function 
applications in a formula) and transformations (like substituting symbols or 
applying equisatisfiable simplifications) of formulas. 

Each ProverEnvironment represents a solver stack and allows to push/pop 
boolean formulas and check them for satisfiability (the hard part). This follows 
the idea of incremental solving (if the underlying SMT solver supports it). After a 
satisfiability check, the ProverEnvironment provides methods to receive a model, 
interpolants, or an unsatisfiable core for the given formula. 

JavASMT guarantees that formulas built with a single FormulaManager 
can be used in several ProverEnvironments, e.g., the same formula can be 
pushed onto and solved within several distinct ProverEnvironments. The in- 
teraction with independent ProverEnvironments works from multiple threads. 
However, some SMT solvers require synchronization (e.g., locking for an in- 
terleaved usage) and other solvers do not require external synchronization 
(this allows concurrent usage). 


SMT-Solver Integration and Bindings. Of the eight SMT solvers that are available 
in JAVASMT, only Princess [46] and SMTINTERPOL [18] were ‘easy’ to integrate, 
as they are written in Scala and Java, respectively. Those solvers also use 
the available memory management and garbage collection of the Java Virtual 
Machine (JVM). All other solvers are written in C/C++ and need a Java Native 
Interface (JNI) wrapper to interface with JavaSMT. Z3 [40] and CVC4 [5] 
provide their own Java wrappers, while the bindings used for MarHSAT5 [19], 
BOOLECTOR [42], and Yıces2 [25] are maintained by us. Those bindings are 
selfwritten or partially based on a version of the solver developers, extended 
with exception handling, and usable for debugging in JAvASMT. By providing 
language bindings for solvers in our library, we relieve the solver developers 
from this burden, and the implementation of exception handling and memory 
management is done in an efficient and common manner across several solvers. 
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Fig. 1: Overview of JavaSMT 


Table 2: Size (LOC) of the Java-based solver wrappers and native solver bindings 


BooLECTOR 


CVvC4 
OptiMaTHSAT 


MatuSAT5 
PRINCESS 

SMT INTERPOL 
YICES2 


oO 
N 


Java-based Wrapper | 1644 1918 3229 3229 2042 2117 2728 2674 
JNI Bindings 3136 1388 1508 1598 


Table 2 lists the size (lines of code) of the wrappers to integrate each solver 
in JavaSMT, in order to get a rough impression of the required effort to get a 
solver and its bindings usable in JavaSMT. The size information consists of two 
parts, namely the JNI bindings that are written in C/C++ and the Java code 
that implements the necessary interfaces of JavaSMT. An expressive solver API 
(like MarHSAT5 or OPTIMATHSAT [47|) needs more code for their binding, with 
only a small increment in complexity compared to other solver bindings. 

Note that the evolution of JavaSMT depends on the evolution of the underlying 
SMT solvers. Z3 is well-known, has a large user group, and an active develop- 
ment team. Yet, interpolation support for Z3 was dropped with release 4.8.1.° 
Bitwuzza [41] is the successor of the SMT solver Bootscror, for which the 
developers still provide small fixes. BrrwuzLa can be supported in JAvASMT in 
the future. CVC4 has been developed further to CVC5. However, the maintainers 


3 https: //github.com/Z3Prover/z3/releases/tag/z3-4.8.1 
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dropped the existing Java API, partially because of issues with the Java garbage 
collection, and plan to replace it.* YicEs2 is also actively maintained and adds 
new features regularly. For example, the developers added support for third-party 
SAT solvers such as CaD1CaL and CryproMiniSar [48]. 


3 New Contributions in JavaSMT 3 


This section describes the improvements over the JAvASMT version from five 
years ago [37], split into two parts. First, we describe newly integrated solvers 
and theory features. Second, we provide information about the build process. 


Support for Additional SMT Solvers. JavaSMT 3 provides access to eight SMT 
solvers. Besides the solvers that were already integrated before, MarHSAT5, 
OPTIMATHSAT, Z3, PRINCESS, and SMTINTERPOL, the user can now additionaly 
use BooLEcTor, CVC4, and Yicrs2. Table 3 lists available theories and impor- 
tant features supported by each individual solver. BooLEcTor is specialized in 
Bitvector-based theories, but does not support the Integer theory. It is shipped 
with several back-end SAT solvers, from which the user can choose a favorite: 
CaDiCaL, CryproMiniSart [48], Linceina, MintSar [26], and PrcoSAT [13]. All 
solvers support the input of plain SMT-LIB2 formulas. However, the feature 
most requested by JavASMT users is the input and output of SMT queries via 
the API, i.e., parsing and printing boolean formulas for a given context. This 
feature is required for (de-)serializing formulas to disk, for network transfer, and 
to translate formulas from one solver to another one. This feature is unfortu- 
nately missing for the newly integrated solvers, even though each solver internally 
already contains code for parsing and printing SMT-LIB2 formulas. 

For formula manipulation, JavaSMT accesses the components of a formula, 
e.g., operators and operands. We do not require full access to the internal data 
structures of the SMT solvers, but only limited access to the most basic parts. 
Only Bootectror does not provide the necessary API. 


Build Simplification. JavaSMT 3 also supports more operating systems than 
before. Besides the existing support for Linux, we started to provide pre-compiled 
binaries for MacOS and Windows for more than half of the available solvers. 
This simplifies the initial steps for new users, which previously were required to 
compile and link the solvers on their own. This was an involving task, because 
of the diversity of build systems and dependencies of each solver. 

In addition to this, we now offer direct support for two popular build sys- 
tems for Java applications, namely Ant and Maven. JAVASMT comes with 
several examples and documentation, such that the mentioned build systems 
can be used to set up JAVASMT in a ready-to-go state on most systems. This 
eliminates the need for complex manual set up of dependencies and eases the 
use of JAvVASMT and the SMT solvers. 


t https: //github.com/cvc5/cvc5/issues/5018 
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Table 3: SMT theories and features supported by SMT solvers in JavaSMT 3 
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Formula Decomposition 


4 Evaluation 


Frameworks that provide a unified API to SMT solvers (such as JavaASMT, 
PySMT, and ScaLaSMT) are necessary because the characteristics of the SMT 
solvers vary a lot. In the evaluation we provide support for this argument. 
We inlined a discussion of the features already in the previous section. Table 3 
provides the overview of supported theories and shows that certain theories are 
available only for a subset of SMT solvers. The table also shows that there are 
several features that restrict the choice of SMT solvers for certain applications. 
In terms of performance, we evaluate JavaASMT3 as a component of 
CPAcuecker |11], which is an open-source software-verification framework ° 
that provides a range of different SMT-based algorithms for program analysis [10] 
and encoding techniques for program control flow [8,12]. We compare three 
well-known and successful SMT-based algorithms for software model checking 
and show that — when using the same algorithm and identical problem encoding 
— the performance result of an analysis depends on the used SMT solver. Some 


5 https: //cpachecker.sosy-lab.org 
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algorithms depend on special features of the SMT solver, e.g., to provide a certain 
type of formula (such as interpolants) and operation on a formula (such as access 
to subformulas). There are SMT solvers that can not be used for some algorithms. 

We aim to show that depending on the feature set of the SMT solvers, it is 
important to support a common API, and additionally, that using the text-based 
interaction via SMT-LIB2 is not an efficient solution, when it comes to formula 
analysis like adding additional information into a formula. 


Benchmark Programs. We evaluate the usage of JavaSMT on a large subset 
of the SV-benchmark suite containing over 1000 verification tasks. To have 
a broad variation of benchmark tasks, we include reachability problems from 
the categories BitVectors, ControlFlow, Heap, and Loops. 

BitVectors depends on bit-precise reasoning and thus, the SMT solver needs 
to support Bitvector logic. Heap depends on modeling heap memory access, e.g., 
which is either encoded in the theory of Arrays or as Uninterpreted Functions. 
The category Loops contains tasks where the state space is potentially quite large. 


Experimental Setup. We run all our experiments on computers with Intel Xeon 
E3-1230 v5 CPUs with 3.40 GHz, and limit the CPU time to 15min and the 
memory to 15GB. We use CPACHECKER revision 136714, which internally uses 
JavaSMT 3.7.0-73. The time needed for transforming the input program into 
SMT queries is rather small compared to the analysis time. Additionally, the 
progress of an algorithm depends on the result (e.g., model values or interpolants) 
returned from an SMT solver, thus we do not explicitly extract the run time 
required by the SMT solver itself for answering the satisfiability problem, but we 
measure the complete CPU time of CPAcHEcksR for the verification run. 


Analysis Configuration. We use three different SMT-based algorithms for software 
verification [10]. The first approach is bounded model checking (BMC) [14, 15], 
which is applied in software and hardware model checking since many years. In this 
approach, a verification problem is encoded as single large SMT query and given 
to the SMT solver. No further interaction with the SMT solver is required. In our 
evaluation, we use a loop bound k = 10, which limits the size of the SMT query. 

The second approach is k-induction [9, 24], which extends BMC, and which 
uses auxiliary invariants to strengthen the induction hypothesis. In this approach, 
the algorithm generates several SMT queries (base case, inductive-step case, each 
with increasing loop bound) and uses an invariant generator that provides the 
auxiliary invariants. We use an interval-based invariant generator that provides 
not only the invariants, but also information about pointers and aliases, which 
must be inserted into the SMT formula using the formula visitor. 

The third approach is predicate abstraction [3, 12,31, 35], which uses Craig 
interpolation [22, 32, 39] to compute predicate abstractions of the program. This 
approach does not only query the SMT solver multiple times, but also uses 
(sequential) interpolation, which is currently supported only by MaruSAT5, 
Princess, and SMTINTERPOL. 


6 https: //github.com/sosy-1lab/sv-benchmarks 
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Fig. 2: Quantile plot for the runtime of k-induction with several SMT solvers 


All approaches are executed in two configurations, depending on the used 
encoding of program statements: First, we apply a bitvector-based encoding that 
precisely models bit-precise arithmetics and overflows of the program. Second, 
an encoding based on linear integer arithmetic is used, which approximates the 
concrete program execution and is sufficient for some programs. 


Solver Configuration. Overall, we aim to show that each solver provides a unique 
fingerprint of features and results. We aim for a precise program analysis and 
thus configure the SMT solvers to be as precise as possible, but with a rea- 
sonable configuration for each solver (i.e., without using a feature combination 
that is unsupported by the SMT solver). 

SMTINTERPOL does not support efficient solving of SMT queries in Bitvector 
logic, thus, it is configured to use only Integer logic. BooLectTor misses Integer 
logic, thus, it is applied only to the bit-precise configurations. Additionally, this 
SMT solver does not support formula inspection and decomposition, which is 
required by several components in k-induction, e.g., to encode proper pointer 
aliasing for the program analysis. While the code for formula inspection is called 
quite often, its influence on the results for the selected benchmark tasks is small. 
In order to be comparable as far as possible, we deactivate pointer aliasing when 
using BOOLECTOR. YICES2 misses proper support for Array logic, thus, we use a 
UF-based encoding of heap memory as alternative for this solver, which results 
in a slightly unsound analysis, but a comparable formula size and run time. 


Results and Discussion. Figure 2 provides the quantile plot for the results of 
k-induction configurations with bit-precise encoding using several SMT solvers. 
The plot shows the CPU time for valid analysis results, i.e., proofs or counterex- 
amples found, for both expected results true and false. We aim for providing all 
result that are useful for a user and do not show results where the tool (or SMT 
solver) crashes or runs out of resources. We do not subtract the run time required 
for the framework CPACHECKER itself (which starts a Java virtual machine), as 
we assume it to be comparable per program task; we are only interested in the 
asymptotics in this evaluation. The overall performance of SMT solvers is similar 
for simple verification tasks, i.e., those with a small run time in the analysis. For 
difficult tasks with harder SMT queries, the differences of the SMT solvers emerge. 
When applying k-induction, the analysis inserts additional constraints into the 
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Table 4: Run time for using different SMT solvers for bounded model checking 
(‘BMC’), k-induction (‘KI’), and predicate abstraction (‘PA’) with the theories 
of Bitvectors (‘BV’) and Integers (‘Int’); CPU time given in seconds with two 
significant digits, ‘ TO’ indicates timeouts (900s), ‘ ERR’ indicates errors, and 
empty cells indicate that the theory or interpolation was not supported 
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Algorithm BMC BMC KI KI KI KI PA PA PA PA 

Encoding Int BV Int Int BV BV Int Int BV BV 

BOOLECTOR 5.8 ERR ERR 

CVC4 340 6.4 TO TO 110 TO 

MatuSAT5 17 7.8 200 53 60 54 TO 11 16 7.1 

PRINCESS TO TO 530 TO 260 TO 38 160 TO ERR 

SMTINTERPOL 50 To 140 TO 13 

Yices2 14 7.7 340 23 34 28 

Z3 15 6.7 130 66 43 21 


SMT formula and requires the SMT solver to allow access to components of 
existing formulas. As BooLecror misses this specific feature, k-induction cannot 
be very effective here. Other SMT solvers are the preferred choice. 

Table 4 contains some example tasks from all used algorithms and encodings, 
where the difference between distinct SMT solvers is noteworthy. Choosing the 
optimal SMT solvers for an arbitrary problem task is not obvious. 


5 Conclusion 


We contribute JAVASMT 3, the third generation of the unifying Java API for 
SMT solvers. The package now contains more SMT solvers, an improved build 
process, and support for MacOS and Windows. The project has over 20 con- 
tributors, 2500 commits, and overall about 41000 lines of code.” JavaSMT is 
used in Java applications (e.g., [23, 33, 36]) as a solution to combine convenience 
and performance for the interaction with SMT solvers, or to switch between 
different solvers and compare them [11, 49]. The most prominent application using 
JAVASMT is the verification framework CPACHECKER (a widely-used software 
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project with 73 forks on GitHub alone), for which JavaSMT was originally 
developed. In the future, we plan to support more SMT solvers, operating sys- 
tems, and hardware architectures, while keeping the user interface stable. We 
hope that even more researchers and developers of Java applications can benefit 
from SMT solving via a convenient and powerful API. 


Data Availability Statement. All benchmark tasks for evaluation, configuration 
files, a ready-to-run version of our implementation, and tables with detailed 
results are available in our reproduction package on Zenodo as virtual machine [1] 
and as ZIP archive [2]. The source code of the open-source library JavaSMT [37] 
is available in the project repository; see https://github.com/sosy-lab/java-smt. 


Funding. This project was supported by the Deutsche Forschungsgemeinschaft 
(DFG) — 378803395 (ConVeY). 
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Siatuare! 


Abstract. The process of developing civil aircraft and their related sys- 
tems includes multiple phases of Preliminary Safety Assessment (PSA). 
An objective of PSA is to link the classification of failure conditions and 
effects (produced in the functional hazard analysis phases) to appro- 
priate safety requirements for elements in the aircraft architecture. A 
complete and correct preliminary safety assessment phase avoids poten- 
tially costly revisions to the design late in the design process. Hence, 
automated ways to support PSA are an important challenge in modern 
aircraft design. A modern approach to conducting PSAs is via the use of 
abstract propagation models, that are basically hyper-graphs where arcs 
model the dependency among components, e.g. how the degradation of 
one component may lead to the degraded or failed operation of another. 
Such models are used for computing failure propagations: the fault of a 
component may have multiple ramifications within the system, causing 
the malfunction of several interconnected components. A central aspect 
of this problem is that of identifying the minimal fault combinations, 
also referred to as minimal cut sets, that cause overall failures. 

In this paper we propose an expressive framework to model failure 
propagation, catering for multiple levels of degradation as well as cyclic 
and nondeterministic dependencies. We define a formal sequential seman- 
tics, and present an efficient SMT-based method for the analysis of failure 
propagation, able to enumerate cut sets that are minimal with respect to 
the order between levels of degradation. In contrast with the state of the 
art, the proposed approach is provably more expressive, and dramatically 
outperforms other systems when a comparison is possible. 


1 Introduction 


The process of developing civil aircraft and their related systems is guided by 
documents ARP4754A [17] and ARP4761 [16] produced by the engineering and 
standards organization SAE International. These documents describe a struc- 
tured process for the safety assessment of these classes of platforms. An impor- 
tant stage is that of the Preliminary Aircraft Safety Assessment (PASA) and 
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Preliminary System Safety Assessment (PSSA). The PASA is followed by mul- 
tiple PSSA, carried out at the level of the systems composing the aircraft. One 
important goal of these process stages is to link the classification of failure con- 
ditions and effects (produced in the aircraft functional hazard analysis phase) to 
appropriate safety requirements for elements in the aircraft architecture. These 
safety requirements drive, among other things, assignment of target Develop- 
ment Assurance Levels (DAL) for items within the architecture. A complete and 
correct preliminary safety assessment phase avoids potentially costly revisions 
to the design late in the design process. Hence, automated ways to support PSA 
are an important challenge in modern aircraft design [18]. 

An important goal of PSAs is to fully understand how faults of simple func- 
tions (e.g. providing electrical power, on-ground braking) interact and propagate 
to affect the overall behaviours (e.g. landing, take-off, taxiing). A modern app- 
roach to conducting such safety assessments is via propagation models [1, 14,19], 
that model the dependency among components, e.g. how the degradation of one 
component may lead to the degraded or failed operation of another. Such mod- 
els are used for computing failure propagations: the fault of a component may 
have multiple ramifications within the system, causing the malfunction of several 
interconnected components. A central problem is identifying the minimal fault 
combinations, also referred to as minimal cut sets, that cause overall failures [12]. 

Given that PSAs occur in the early stages of the development process when 
limited information regarding the design is available, reasoning is carried out at 
a very high level of abstraction. Therefore, instead of using behavioural models 
(e.g., infinite-state transition systems) adopted in formal verification, the system 
is more naturally modeled by a simpler formalism of propagation graphs. This 
does not make PSA any easier. There are in fact several aspects that must be 
taken into account. The first problem is the sheer size of propagation graphs, 
both in terms of nodes and hyper-paths to be explored, which make enumerative 
techniques completely inadequate. 

Second, the propagation is non-Boolean [19]. That is, fu 
the degradation levels of the system functions are not 
binary (working vs not working) but the functions may be 
subject to different levels of degradation (e.g. fully oper- fs fa 
ational, partly failed, completely failed), and fail in dif "Ss we 
ferent ways (e.g. detected vs undetected, stuck open vs w 
stuck closed), and different failures may be associated to 
different probabilities [19]. For example, the state of acom- Fig.1. Hasse dia- 
ponent can be abstractly modeled into working (w), failed Stam of the FDS 
safe (fs), failed detected (fd), or failed undetected (fu), with WSF [14]. 
degrees of degradation partially ordered as shown in Fig. 1. 

In this setting, the notion of minimality needs to take into account the order 
among the levels of degradation, and can not be simply considered in terms of 
minimality with respect to set-inclusion. Third, various forms of failure propaga- 
tion may be possible, e.g., nondeterministic, temporally-constrained, cyclic. For 
example, the failure of a power generator may lead, within a certain amount of 
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time, to a depleted battery and then to the loss of an engine. In turn, the loss of 
an engine may compromise the ability to generate power, which clearly requires 
the ability to deal with cyclic propagation graphs. Additionally, a failure of the 
control system might cause a pressure valve to become either stuck open or stuck 
closed; this requires the ability to deal with nondeterministic propagations. 

In this paper we tackle the problem of analyzing failure propagation in the full 
generality required by real-world applications. We start from Finite Degradation 
Structures (FDS) [14], a recently-proposed modeling framework, which unifies 
various combinational models traditionally used in safety analysis (such as fault 
trees and minimal cut sets) and generalizes them to deal with different levels of 
degradation. We propose a framework, referred to as PGFDS (Propagation Graphs 
over FDS), that allows to model non-deterministic and cyclic propagation graphs. 
The framework is general and can be used in other safety-critical domains. 

In order to deal with cyclic behaviours, PGFDS require a sequential semantics, 
expressed via symbolic transition systems. The computation of minimal cut sets 
over PGFDS can be carried out by means of techniques based on model checking, 
developed for the general case of behavioural models [6]. 

Then, we prove that it is possible to carry out the same analysis within a 
combinational setting, leveraging two widely adopted assumptions: that faults 
are persistent and that the fault propagation is monotone. These assumptions 
allow us to devise an efficient algorithm that can analyze fault propagations of 
realistic industrial benchmarks that are currently out of reach of state-of-the-art 
methods. The analysis of PGFDS is reduced to model enumeration for an SMT 
formula that does not require the explicit unrolling of the transition system. 
We tackle two key difficulties. The first one is to ensure causality and rule out 
self-supporting fault configurations in the combinational encoding. This is done 
by imposing cycle-breaking constraints requiring the existence of a partial order 
that is then constructed by the SMT solver during the analysis. The second one 
is to devise efficient enumeration techniques of models that are FDS-minimal, 
i.e., minimal with respect to the severity of the degradation given by the FDS. 
To this end, we propose an SMT-based enumerator of FDS-minimal models. 

We have experimentally evaluated our approach on a comprehensive set of 
realistic benchmarks, also generating random systems that have a similar struc- 
ture as our proprietary systems!. The results demonstrate substantial advances 
with respect to the state of the art. Our approach is clearly superior to the app- 
roach proposed in [14], that is limited to the case of acyclic deterministic PGFDS. 
For the cyclic PGFDSs, we contrast our approach against the sequential approach 
based on model-checking and show that our approach is able to scale to large 
PGFDS, dramatically outperforming the sequential approach. 

This paper is structured as follows. In Sect. 2 we present the mathematical 
notation and background on FDS. In Sect. 3 we describe Propagation Graphs over 
FDS (PGFDS). In Sect. 4 we present the combinational encoding of PGFDS into 
SMT. In Sect.5 we describe how to use the SMT encoding for the enumeration 
of FDS-minimal cut sets. In Sect. 6 we discuss some related work, and in Sect. 7 


1 Unfortunately the proprietary systems cannot be disclosed. 
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we present the experimental evaluation. In Sect. 8 we draw some conclusions and 
outline directions for future work. 


2 Preliminaries 


In the section, we explain the basic mathematical conventions that are used 
in the paper. We assume that the reader is familiar with the basic ideas of 
Satisfiability Modulo Theories (SMT) and in particular with the theory of linear 
integer arithmetic and the DPLL(T) procedure, as presented, e.g., in [2]. 

If convenient, we define unary functions with small domains in-place exten- 
sionally, e.g., {1 — 2,2 3} is a function with domain {1,2} that maps 1 to 2 
and 2 to 3. We say that the n-ary function f(21,22,...,2n) depends on its for- 
mal argument x; if there are some values v1, v2,...,Un,U; in the corresponding 
domains such that f(v1,v2,...,Ui,---Un) Æ f(vi, V2, ..., Uk,- .. Un). Given sets A 
and B, we denote as B4 the set of all functions from A to B. Given a partially 
ordered set (A, <), its subset B C A is called an upper (resp. lower) set if for 
all b € B, a € A, the condition a > b (resp. a < b) implies a € B. 

A Finite Degradation Structure (FDS) [14] is a triple (FM, <, L), where FM 
is a finite set of failure modes and < is a partial order on FM with the least 
element L. For any set A and an FDS B = (FMp,<pg, Lp), the FDS B4 for 
the set of functions from A to FM pg is defined as ((FM g)4,<pa,1 pa), where 
Lpa(a) = Lpg for alla € A, and f <ga f’ if and only if f(a) <p f'(a) for all 
a € A. We assume that each FDS contains at least two elements. We say that 
an FDS is Boolean if it is isomorphic to the structure ({L, T}, L < T, 1). In the 
following, for an FDs D = (FM,<, L), we denote elements of the set FMwith 
f, f and call them failure modes. 

Given a first-order formula y over the language of the theory of linear integer 
arithmetic, an assignment u that assigns a value (b) € {false, true} to each 
free Boolean variable b of y and a value u(n) € Z to each free integer variable n 
of ọ is called a model of y (denoted u = vy) if u makes ọ true. If B is a subset 
of free Boolean variables of p, the model u = v is called subset-minimal with 
respect to B if there is no model p’ = y such that {b € B | p’(b) = true} ¢ {b € 
B | a(b) = true}. 

A transition system TS is a tuple (X,I,T) where X is a set of (state) vari- 
ables, I(X) is a formula representing the initial states, and T(X, X’) is a formula 
representing the transitions. A state of TS is an assignment to the variables X. 
A trace of M is a (possibly infinite) sequence so, s1, . . . of states such that so = I 
and, for all 7 > 0, si, si}; = T. 


3 Propagation Graphs over FDSs 


In this section, we introduce our model for fault propagation, which we call 
Propagation Graphs over FDSs (PGFDS), and provide a sequential semantics for 
it which can be used to encode PGFDSs into transition systems. 
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Intuitively, a Propagation Graph over FDS (PGFDS) consists of a set of com- 
ponents of the system and of the nert function. In each step of the failure 
propagation, each component is in some failure mode from the underlying FDS. 
In the next step of the failure propagation, each component can either 1) stay in 
its previous failure mode or 2) switch to an arbitrary failure mode from the set 
of possible next failure modes. The set of possible next failure modes for each 
component is given by the function nezt, based on the current failure modes of 
all components in the system. 


Definition 1 (Propagation Graph over FDS (PGFDS)). Given a finite 
degradation structure D = (FM,<, L), a propagation graph over D is a pair 
S = (C, next), where 


- C is a finite set of system components, and 

— nett: C = (FM? — 2'™) is a mapping that assigns to each component 
c E€ C a next failure mode function nezt(c), which maps failure modes of all 
components in C to a set of possible next failure modes of c. 


A state of S is a mapping s: C — FM that assigns a failure mode f € FM to 
each system component c € C. 


Example 1. Consider a system with three components, H (hydraulic), E (elec- 
tric), and G (control on ground), over the Boolean FDS ({L,T},L < T,1). 
Each of the components is either working correctly (represented by the failure 
mode L) or incorrectly (T). Component G depends on the correct functionality 
of either E or H. Component E depends on H to function correctly and, symmet- 
rically, H depends on E. The failure propagation of this system can be described 
by a PGFDS S' = ({G, E, H}, neat), where 


— next(G)(s) = {T} if s(E) = s(H) = T and neat(G)(s) = Ø otherwise; 
— next(E)(s) = {T} if s(H) = T and nezt(E)(s) = Ø otherwise; 
— next(H)(s) = {T} if s(E) = T and neat(H)(s) = Ø otherwise. 


Note that nezt(c)(s) = @ means that if the system is in the state s, the component 
c cannot change its current failure mode. 

The structure is intuitively associated with the hypergraph depicted in Fig. 2. 
The dashed rectangles represent the fact that each component can fail on its own 
(locally); the hyper-arc from E and H to G is conjunctive, while the arcs incoming 
into a node are disjunctive. 


The important assumption of our approach is that we consider only fault- 
persistent propagations, i.e., fault propagations where each component can fail 
only once and after it does, it stays in the same failure mode forever. Note that 
this is a realistic assumption that is also used in other techniques for reliability 
analysis [5]. It is also implicitly used in other modeling techniques that are purely 
combinational (e.g., [19]) because they model the system only in a single time 
step, without considering any change in time whatsoever. Single propagation step 
of such computations can be described by a fault-persistent transition relation; 
the whole such computation as fault-persistent failure propagation. 
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Fig. 2. The hypergraph view of a simple PGFDS. 


Definition 2 (Fault-persistent transition relation). Let S = (C, next) be 
a PGFDS over an FDS with the least element L. The fault-persistent transition 
relation of S, denoted as Rs, is the binary relation between states of S such that 
for all states s,s’, the relation R,(s,s’) holds if and only if for each c € C 


— s'(c) = s(c) or 
— s(c) = L and s'(c) € nezt(c)(s). 


Definition 3 (Fault-persistent failure propagation). Given a PGFDS S = 
(C, neat), its fault-persistent transition relation Rs, and k € N, the sequence 
(si)o<i<k of states of S is called a fault-persistent failure propagation if the 
relation R,(8;, 8:41) holds for allO < i< k. 


Because we deal only with fault-persistent failure propagations in this paper, 
we from now on refer to the fault-persistent transition relation and the fault- 
persistent failure propagation only as transition relation and failure propagation, 
respectively. 


Definition 4 (Cyclic PGFDS). Let S = (C, next) be a PGFDS. A component 
c E€ C depends on a component d € C iff next(c)(s) # next(c)(s’) for some 
s,s: C + FM such that s(d) # s'(d) and s(c’) = s'(c) for all € 4 d. Let 
deps(c) := {d € C | c depends on d}, D C C x C be such that D(c,c') if and 
only if d€ € deps(c), and let D* be the transitive closure of D. Then we say that 
S is cyclic if and only if there exists c € C such that D*(c,c) holds. 


Example 2. In the PGFDS S from Example 1, the component G depends on com- 
ponents E and H, the component E depends on H, and the component H depends 
on E. The PGFDs S is therefore cyclic because E (and also H) transitively depends 
on itself. 


To analyze reliability of the modeled system, it is important to identify the 
failures of its components (i.e., assignment of failure modes to the components) 
which cause the system to reach a given set of dangerous states, usually called top 
level event (TLE). Such assignments are called cut sets. Since the number of all 
cut sets can be prohibitively large, it is often enough to identify the least severe 
failures in terms of the underlying FDS that are sufficient to cause the TLE. 
Such cut sets are called FDS-minimal, or minimal for short. These concepts are 
formalized in the following definitions. 
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Definition 5 (Top Level Event). Given a PGFDS S, a Top Level Event 
(TLE) is an arbitrary set of states of S. 


Definition 6 ((FDS-Minimal) Cut Set). Given a PGFDS S = (C, next), and 
a top level event TLE, a cut set is any state s for which there is a fault-persistent 
failure propagation that starts in s and ends in some sp € TLE. A cut set is 
called FDS-minimal (or minimal for short) if it is minimal with respect to the 
pointwise ordering < of the underlying FDS. 


Given a system S and a top level event TLE, we denote the set of all correspond- 
ing cut sets as CS(S, TLE) and the set of all minimal cut sets as MCS(S, TLE). 
As a convention, when talking about cut sets, we will explicitly mention only 
the components to which the cut set assigns a failure mode different from L. 


Example 3. Consider again the PGFDS S' from Example 1 and the top level event 
TLE = {s: {G,E, H} — {T, L} | s(G) = T}, which corresponds to the compo- 
nent G not working correctly. The minimal cut sets for the PGFDS S and the 
given top level event are 


1. {G++ T}, witnessed by a failure propagation ({G > T,E > L,H => L}) of 
length 1. 


2. {E> T}, witnessed by a failure propagation ({G > L, E > T, H > L}, {an 
LER THR T}, {G} ,Et H! }) of length 3. 
3. {H+ T}, witnessed by a failure propagation ({G > L,E > L,H > T}, {G > 


LER THe T}, {G T,E m T, H > TS) of length 3. 


Note that besides these three minimal cut sets, there are other cut sets that are 
not minimal, such as {E > T,H => T}. 


Fault-persistent computations of a PGFDS can be easily represented as traces 
of a (symbolic) transition system. 


Definition 7 (Fault-persistent transition system). Given a PGFDS S = 
(C, next) and an FDS D = (FM,<, L), the corresponding fault-persistent (sym- 
bolic) transition system is given by TS s = (X, true, T), where: 


- X = {xe | cE C} is the set of state variables, with domain FM; 

- T(X, X’) is a symbolic encoding of the fault-persistent transition relation of 
S as given in Definition 2. That is, for each assignment u: X UX’ > FM, 
u =T if and only if Rs(s,s') holds, where s: C + FM is defined as s(c) = 
u(zc) (and similarly for s'). 


By definition, every fault-persistent computation of S$ has a corresponding trace 
(of the same length) in TS'g. Therefore, encoding PGFDSs as transition systems 
allows leveraging off-the-shelf algorithms for subset-minimal cut set enumer- 
ation, such as those given in [6]. However, this might be inefficient, particu- 
larly for TLEs that are triggered by long failure propagations (corresponding 
to equally-long traces of the induced transition system). Moreover, as we show 
later, enumerating FDS-minimal cut sets is more involved. 
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Fault propagation systems used in practice often have the property that no 
transition can be disabled by additional faults, i.e., by switching a failure mode 
of a component from L to f # L. This is also the case for the PGFDS from 
Example 1. Such systems are called subset-monotone or monotone for short. 
This is formalized by the following definition. 


Definition 8 (Subset-monotone PGFDS). A pcrps S = (C, next) is called 
subset-monotone if for all s,s’: C — FM, the condition Vc € C. s(c) # L > 
s(c) = s' (c) implies Vc E€ C. next(c)(s) C nezt(c)(s’). 


4 From Sequential to Combinational 


In this section, we describe a combinational encoding of fault-persistent compu- 
tations of a PGFDS, which is guaranteed to be exact for subset-monotone PGFDSs 
and provides a useful overapproximation for general PGFDSs. In the rest of the 
section, let S = (C, next) be a PGFDS over the FDS D = (FM,<, L), and TLE 
be a top level event. We show how to construct a first-order formula Yes over 
the theory of linear integer arithmetic whose models correspond to cut sets of S 
with respect to TLE. In the next section, we then use this formula to enumerate 
all FDS-minimal cut sets of S. 

To encode the propagations of S, for each component c € C and each failure 
mode f € FM we introduce two Boolean variables: [,,¢ and Fe, f. The variable 
Ie p encodes whether c was in the failure mode f in the initial state of the 
propagation. The variable Fį f encodes whether c has been in the failure mode 
f at any time during the propagation. We can then encode TLE as a formula 
rie over variables Fe p.” 

Considering now a possible propagation, a component c can be in failure 
mode f # L at some time during the propagation for two reasons: either it was 
already in f in the initial state of the propagation, or it transitions to f because 
of its next function. The first case is represented by Ie f being true. The second 
case can be encoded as follows (for each c € C and f € FM \ {1}): 


V N Fasa (1) 


s: C—FM dé€deps(c 
A A 


stating that there must exist a row in the truth table of next(c), whose result 
includes f and which agrees with the current state on the failure modes of failed 
dependencies. The above, however, would not work in the presence of cycles. 
This can already be seen on the simple cyclic PGFDS from Example 1. 


? A naive encoding would be using the formula Veerin(Ncec,s(c)¢t Foste) ^ 
Necc,s()=1 N ferma} —Fe, f), but more compact representations are of course pos- 
sible (particularly if TLE is given symbolically). 

3 This formula can again be encoded more compactly; particularly if the next function 
is given symbolically, which is usually the case in practice. 
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Example 4. Consider again the PGFDS S from Example 1. The above-described 
encoding of the propagations of S is 


(Fat => (Ie,T V (Fis, A Fiut))) x 
(Fer > (Ier V Fu,t)) A 
(Iat V Fe,T)). 


Although this encoding has a model u such that u =| =Ie,T A ~He, T A alat A 
Fost A Fat A Fit, there is no propagation path of S in which both components 
E and H are initially in the state L and switch to state T during the propagation. 
The problem is that the encoding allows models where a failure of E was caused 
by a failure of H, which was in turn caused by the same failure of E. 


, 


In order to solve the problem, we introduce constraints imposing a causal 
ordering among the components, stating that the failure of a component can be 
caused only by other components that precede it in the causal order. We encode 
this by introducing one additional integer variable oe for each component c, 
which intuitively corresponds to the time when the component c switched to a 
failure mode different from L, and modifying the formula (1) to take the causal 
ordering into account:* 


V \ (Fa,s(4) A Oq < 0c) í (2) 
eE ig 


Putting it all together, the encoding for the failure mode changes is given by the 
formula Yrest below: 


Pnext = VAN (Fef = (Ief V (2))) A^ (Ief T Fog) 
ferM\(1} 


Example 5. For the PGFDS S from Example 1, the correct encoding of the prop- 
agations of S' is thus the following formula Pnezt: 


(Fat —> Cet V ((Fa,t A On < 0c) A (Fit A On < 0c))) A 
(Ie, > Fe, ) A 

(Fat —> (Irr V (Fut A On < On))) A 

(et > Fut) A 

(Far — (ut V (Fu,t A On < On))) A 

(ut > Fir). 


Note that the constraints for causal ordering now rule out the spurious self- 
supporting propagation in which E fails because of H and H fails because of E. 


4 We remark that such ordering constraints are needed only if the input PGFDS is 
cyclic, and only between components in the same strongly connected component of 
the dependency graph. 
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This would require that oy < og and og < oy are both true, which is clearly 
impossible in the theory of linear integer arithmetic (or, more generally, in any 
theory in which < is interpreted as a strict ordering relation). 

The propagations of S mentioned in Example 3 correspond to the following 
assignments: 


1. The propagation for the cut set {G ++ T} corresponds to an assignment p 
such that H F Te, A LT A a7 A For A FT A Fu, and uloa) = 
Mon) = (0n) = 0. 

2. The propagation for the cut set {E + T} corresponds to an assignment p 
such that u = ~le A Int A aint A Fost A Fat A Fut and pu(og) = 2, 
Moz) = 0, (on) = 1. 

3. The propagation for the cut set {H ++ T} corresponds to an assignment p 
such that u = alg. t A aler A Int A Fost A Fer A Fut and p(og) = 2, 
Moz) = 1, wou) = 0. 


These assignments are not unique; there are infinitely many choices for the values 
of the ordering variables oe. Also note that there is no global causality ordering 
for the system: the causality ordering is different for different propagations. 


Finally, we encode the fault-persistence constraint by stating that no component 
can be in two failure modes either in the initial state of the propagation or at 
any time during the propagation: 


Ponce = VAN (~ cf V7 efi) A (> cf V7 cf!) « 


cEC 
f,f'EFM\{L} 
FEF 


The final formula is then given by Yes: 


Peces = PTLE A Pnext A Ponce- 


As the following theorem shows, the formula Yes for general systems encodes 
an overapproximation of the set CS(S, TLE). The reason for this is that the 
encoding does not enforce failure mode of dependencies that are working, i.e., 
are in the failure mode L. Note that even an overapproximation of CS(S, TLE) 
is useful for safety analysis; it can be used, for example, for computing an upper 
bound on the probability of failure of the system. Moreover, if the system S' is 
subset-monotone, which is often the case in practice, the formula Yes is guaran- 
teed to encode the set CS(.S, TLE) exactly. 

To formulate the relationship precisely, we define the function that provides 
the correspondence between the models of and the cut sets of S. Observe 
that thanks to Yonce, each model u of Pes corresponds to a unique initial state 
model ToState(:) of S as defined below: 


f, if {f E€ FM \ {L} | uUe) = true} = {f}, 


Menoe Na if {f € FM \ {1} | alep) = true} = 0. 
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MCS-enumeration(ycs, modelToState): 


1. solver := SMT-solver() 

2. res:=0 

3. assert-formula(solver, pes) 

4. for Ief E€ vars(Qes): 

5. add-preferred-var(solver, Ie f, false) 
6. while check-sat(solver): 

7. u := get-model(solver) 

8. a := true 

9. for Ie f E€ vars(Pes): 

10. if u(Ie, f) = true: 

11. Wi=WATo + 

12. res := res U {model ToState(,s) } 
13. assert-formula(solver, =~) 


14. return res 


Fig. 3. SMT-based MCS enumeration algorithm. 


Theorem 1. For an arbitrary PGFDS S and a top level event TLE, 


CS(S,TLE) C {modelToState() | u H| Yes}. 


Moreover, if S is subset-monotone, these sets are equal. 


5 Enumeration of FDS-Minimal Cut Sets 


In this section, we show how to efficiently enumerate FDS-minimal cut sets of 
subset-monotone systems using the formula Yes and an SMT solver. We first 
consider a simplified case, in which the underlying FDS D is Boolean. We then 
show how to generalize our solution to arbitrary FDSs. 


5.1 Algorithm for Boolean FDSs 


The pseudo-code of our procedure for the case when the underlying FDS is 
Boolean is shown in Fig. 3. Intuitively, the algorithm enumerates all the subset- 
minimal models of Yes with respect to the set of variables of form Te f. These 
models are enumerated one by one and each enumerated model is, together with 
all its supermodels, blocked by the assertion on line 13, until the formula becomes 
unsatisfiable. Each model of the formula is converted to a cut set by the function 
modelToState. 

The algorithm makes use of a DPLL(T)-based SMT solver that provides the 
following functionalities: 


1. An assert-formula method that allows to add constraints incrementally; 
2. A check-sat method to determine the satisfiability of the current set of con- 
straints; 
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3. A get-model method that returns a model for the current asserted set of 
constraints, in case they are satisfiable; 

4. An add-preferred-var method that allows to control the branching heuristics 
of the internal SAT engine of the solver, such that whenever a SAT decision 
needs to be performed, variables in the preferred set are always considered 
before the other variables for branching, and are assigned the value specified 
in the add-preferred-var call.° 


The correctness for our algorithm is formalized by the theorem below. 


Theorem 2 (MCS enumeration over Boolean FDS). For a subset- 
monotone PGFDS S over the Boolean FDS, the result of MCS— 
enumeration(pcs, modelToState) is the set of all FDS-minimal cut sets of S. 


Proof. Let S = (C, next) be a subset-monotone PGFDS. It was proven by Di 
Rosa et al. [15] that if branching heuristics of a CDCL-based SAT solver are 
modified to assign false to a subset V of variables before branching on other 
variables (lines 4-5 of our pseudocode), the produced model is subset-minimal 
with respect to the set of variables V. This claim straightforwardly extends to 
DPLL(T)-based SMT solvers. In every iteration, the algorithm thus finds one 
subset-minimal model u of Pes with respect to the set of variables Ie p and 
adds a constraint that prevents enumerating any model u’ such that {Ie f € 
vars(Yes) | (Ief) = true} C {Ief E vars(~es) | wef) = true} in the 
following iterations. Therefore, the described algorithm enumerates, for each 
model p of the formula H Fe p | c € C, f € FM}Af{o,. | c E C}(Yes) that is 
subset-minimal with respect to the set of variables Ie ¢, exactly one model u of 
Yes that agrees with F on all variables J, -. 

Note that vars(ycs) does not contain the variable Ie, for any c € C. For a 
Boolean FDS and models p, u’ F Yes, we thus have {Ie f € vars(Pes) | ue, f) = 
true} C {Ie f E vars(Ppes) | w (Ief) = true} if and only if modelToState(u) < 
modelToState(:'). Therefore, Theorem 1 implies that for subset-monotone S, 
subset-minimal models of Yes with respect to the set of variables of form J, f 
precisely correspond to FDS-minimal cut sets of S and the correspondence is 
given by the function modelToState. 


5.2 Extension to Arbitrary FDSs 


The algorithm of Fig. 3 does not work in general for arbitrary FDSs, but only for 
the FDSs in which all the failure modes different from L are incomparable. The 
problem is that the assumption that a cut set is FDS-minimal iff the correspond- 
ing model of Yes is subset-minimal with respect to the set of variables [.,¢ with 
f # L does not hold in general with the encoding of Sect.4, as can be seen on 
the following simple example. 


5 For example, calling add-preferred-var(solver, v, true) means that if the solver has 
to perform a case split, v will be assigned before all non-preferred variables, and it 
will always be assigned to true by the branching heuristic. 
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{w, fd, fu} = wA-7fs A fd A fu 
{w, fs} = wAfsAn-7fd A afu {w,fd} = wA-7fs A fd A afu 
{w} = wAn7fs A -7fd \ afu 


Fig. 4. Hasse diagram of the ordered set (W3F |, C) together with the encoding of the 
elements as formulas. 


Example 6. Consider the Fps D = ({L,m, T}, L < m < T, L) and the PGFDS 
S = ({c}, next) with nezt(c)(s) = @ for all c and s. Intuitively, S contains 
one component that cannot change its failure mode during the computation. 
Consider further the top-level event TLE = {{c > m}, {c Th}. 

Both {c > T} and {c > m} are cut sets, but only the latter is FDS-minimal. 
However, the algorithm of Fig. 3 will return both, since they both correspond to 
subset-minimal models with respect to the set of variables Ie f. 


We can adapt the procedure of Fig. 3 to arbitrary FDSs by using an encoding 
in which the ordering of assignments to the Ie f variables corresponds to the 
severity ordering < of the underlying FDS D. In order to do this, we exploit the 
isomorphism between D = (FM,<, L) and the poset D | of its lower subsets 
generated by single elements defined as D |= {{f e FM | f’ < f} | fe 
FM} with partial order C and the least element {1}. For example, the poset 
(W3F |,C) for the Fos W3F of Fig.1 is shown in Fig. 4, together with an 
encoding of the elements as formulas. 

With this isomorphism in mind, we define for each c € C and f € FM the 
formula 7. that represents the failure mode f of component c by assigning 
the subset of variables {J, ; | f < f} to true: 


We=f = \ To ^ A Me f- 


ferM,f<f ferM fxf 


The important property of this definition is that for all c € C, f,f’ € FM 
and assignments u H} Wear and u =| Way, we have f < f’ if and only if 
{1.7 | uU. f) = true} C {1,7 | w (I 7) = true}. 

We then modify the encoding y,, of Sect. 4 as follows: 


1. First, we modify nert to encode the initial state by using w=, instead of 
Ie f. This ensures that the ordering of assignments to the initial variables 
reflects the ordering given by the underlying FDS. We also remove the mutual 
exclusion constraints on the variables Ie f from Yonce, because the mutual 
exclusion of initial failure modes is now guaranteed by the definition of =: 
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net = N (Eas > (eas V (2))) A (one > Faf), 
cEC 
fEFM\{1} 
Ponce = VAN (= cf V7 gnf) 
cEC 
f.f/€FM\{L} 
LAL 


2. Then, we add domain constraints that ensure that the resulting formula rep- 


resents only models with assignments to Ie, ¢ that correspond to elements of 
D}: 


on =N V ter 


cEC fEFM 
The new encoding is then given by yM: 
pee = YTLE A^ Pneat A Ponce A PD,- 


The modified encoding y*™ represents the cut sets in a different way: instead 
of representing the failure modes directly by Ic, as in Yes, they are now repre- 
sented by the subformulas w.—f. Therefore, to prove correctness of the modified 
encoding, the function modelToState that maps models to cut sets also has to 
be changed. We define the initial state modelToState’™ (u) corresponding to the 
model u by modelToState*™ (,1)(c) = max{ f € FM | u(Te, f) = true}. Note that 
the maximum is guaranteed to exist because of the yp, constraint. 


Theorem 3. For an arbitrary PGFDS S and a top level event TLE, 


CS(S,TLE) C {modelToState*™ (u) | u H oh™}. 


Moreover, if S is subset-monotone, these sets are equal. 


Therefore, the algorithm MCS-enumeration from Fig. 3 can be used to enu- 
merate FDS-minimal cut sets of a subset-monotone PGFDS, given as the inputs 
the modified encoding yEM and the modified function modelT. oState’™ . This is 
formalized by the following theorem: 


Theorem 4 (MCS enumeration for general FDS). For a subset-monotone 
PGFDS S over an FDS D, the result of MCS-enumeration(y*™ , modelToState’™ ) 
is the set of all FDS-minimal cut sets of S. 


Note that our encoding of FDS-minimality is general and does not depend on 
the algorithm for enumeration of subset-minimal models. Indeed, thanks to our 
encoding, any off-the-shelf minimal-model enumerator can be used to enumerate 
FDS-minimal models. Therefore, any improvements to minimal model enumera- 
tion directly translate to improved performance of our method for FDS-minimal 
cut set enumeration. From the opposite point of view, our encoding can in prin- 
ciple be employed by other tools to reduce FDS-minimal cut set enumeration to 
subset-minimal cut set enumeration. 
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6 Related Work 


Finite Degradation Models (FDMs) [14] are an algebraic framework accommo- 
dating the concept of fault degradation, where faults may have different values 
organized into a semi-lattice. Using FDMs (probabilistic) safety analysis (fault 
trees and minimal cut sets) can be generalized from Boolean models to multi- 
state systems. Compared to FDMs, fault-persistent PGFDSs differ in two signifi- 
cant aspects: first, since the function nezt returns a set of possible next failure 
modes, PGFDSs allow non-determinism in the failure propagation, i.e., the failure 
of a component is not uniquely determined by the failure modes of its depen- 
dencies. Second, and more importantly, PGFDSs allow cyclic dependencies and 
give them well-defined and expected semantics. Since the work on FDMs is the 
closest to ours, we shall discuss it in detail below. 

In [8] the authors present a framework for failure propagation which enables 
modeling sets of failure modes using a domain specific language. It is less expres- 
sive than FDMs, in that sets of failure modes cannot be related by degradation 
orders, which significantly simplifies the enumeration of MCSs. Finally, classical 
formalisms for failure propagation, but less expressive than FDS, include FPTN [9] 
and Hip-HOps [11]. 

TFPGs (Timed Failure Propagation Graphs) [1] extend fault propagation 
model by enabling the specification of time bounds and mode constraints on 
the propagation links. However, TFPGs do not consider degradation, and they 
do not support cyclic dependencies. Conversely, the PGFDS formalism can be 
easily extended to support time bounds, failure probabilities, mode constraints, 
and constraints on propagation delays similar to those available in TFPGs (e.g., 
following [5]). Moreover, once the minimal cut sets of a PGFDS are computed, 
the existing approach to computing probability of overall failure [5] can be used 
almost unchanged. 

Finally, xSAP [3] is a safety analysis platform that supports library-based 
fault models and the generation of safety artifacts for fully general behavioral 
models, e.g., it can generate fault trees and minimal cut sets for arbitrary transi- 
tion systems [6]. Currently, xSAP does not support FDS and degradation models. 


6.1 Detailed Comparison with Finite Degradation Models 


As outlined above, the formalism Finite Degradation Models (FDMs), introduced 
in [14], is closely related to our PGFDS. Here, we describe FDM in further detail 
and show that PGFDS are a strict generalization of FDS, obtained by (i) consid- 
ering non-determinism in the propagation of failures, and (ii) by allowing cyclic 
dependencies among the components. 

Each FDM has state variables, which correspond to the sources of failures in 
the system, and flow variables, which correspond to the propagated consequences 
of these failures. Each flow variable has an associated equation, which prescribes 
the failure mode of the corresponding flow variable based on the failure modes 
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of state variables and other flow variables. We assume that the failure modes of 
all state and flow variables are modeled by the FDS D = (FM, <, 1).° 


Definition 9 (Finite Degradation Model [14]). Given an arbitrary FDS D = 
(FM,<, L), a Finite Degradation Model (FDM) is a pair M = (V=SWF,6€), 
where 


- S={Vj,...,Vm} is a finite set of state variables, 

—~ F={Wm4yi,---,Wm+n} is a finite set of flow variables, 

— E = {Wm+1 := bmsgi,---Wmstn = m+n} is a finite set of equations, where 
each m+: for 1 <i < n is a function of type FMY — FM. 


We say that a flow variable W,,,4; depends on a variable v if the function ¢@n4; 
depends on v. An FDM is called acyclic if there are no cyclic dependencies among 
its flow variables, i.e., no flow variable transitively depends on itself. We stress 
out that in contrast to our definitions of PGFDS, the original paper [14] only deals 
with acyclic FDMs and does not provide semantics and necessary definitions for 
cyclic FDMs. We thus assume in the rest of the section that all FDMs are acyclic. 

An assignment o: V — FM is called admissible if the failure modes assigned 
to the flow variables satisfy all the corresponding equations, i.e., o(Wm+i) = 
om+i(o) for each 1 < i < n. The assumption of acyclicity of FDMs, together with 
the fact that all equations are deterministic functions and not general relations, 
guarantees that in each admissible assignment, failure modes of the flow variables 
are uniquely determined by the failure modes of the state variables. This defines 
a function [M](c) = om, which maps each state variable assignment ø to its 
unique admissible extension Fm that assigns values to all variables. This is a 
stark contrast to PGFDS, where a single initial state can give rise to multiple 
different propagation paths. 

A corresponding notion to our notion of top level event for FDM is the notion 
of observer. An observer is a pair (R,U), where R is a flow variable and U C FM 
is a set of failure modes. Intuitively, the observer represents a set of dangerous 
failure modes of the given flow variable. A cut set is any assignment o: S => FM 
of failure modes to state variables such that a(R) € U. 

A notion related to our notion of monotonicity for FDM is coherence. The 
observer is coherent if for all assignments 0,0’: S — FM such that ø is a cut 
set and o < o’, the assignment ø’ is also a cut set. 

Each FDM M can be translated to a PGFDS Spm such that the cut sets of 
M correspond to the cut sets of Sm. Moreover, if the FoM M is coherent, the 
resulting PGFDS Sm is guaranteed to be subset-monotone. This enables efficient 
analysis of coherent FDMs by our SMT-based technique. Intuitively, the PGFDS 
Sm has one component for each state variable of M and an additional component 
R for the observer flow variable R. The next function is defined in a way that the 
failure modes of all the components that correspond to state variables cannot 


6 Both FDMs and our PGFDS can be defined over multiple different FDss for different 
variables. Such generalization is straightforward, but it complicates the notation and 
the exposition significantly. 
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Table 1. Classes of PGFDS and their traces that each of the compared tools can handle 
precisely. 


Tool FDS Cycles Nondeterministic Not fault-persistent 
Emmy Arbitrary No No No 
xSAP Boolean Yes Yes Yes 
SMT-PGFPS| Arbitrary Yes Yes No 


change and that the component R can switch to a predefined set of failure modes 
if a(R) € U. This is achieved by composing all equations for the flow variables. 
If local variables are used in the symbolic encoding’, the size of the result is 
guaranteed to be polynomial. 


7 Experimental Evaluation 


To evaluate the performance and scalability of our approach, we have imple- 
mented the proposed algorithm MCS-enumeration in a simple Python tool that 
uses the solver MathSAT [7], which supports all the required functionalities that 
are described in Sect. 5.1. In this section, we refer to the tool as SMT-PGFPS. 

As a comparison, we have used Emmy [13], a tool based on decision diagrams 
for the enumeration of FDS-minimal cut sets of FDMs, and xSAP [3], a tool 
for safety assessment for arbitrary transition systems. Each of these tools only 
supports a subset of the capabilities of our approach, as summarized in Table 1. 


Emmy supports minimal cut set enumeration with respect to an arbitrary order- 
ing of failure modes given by an FDS, but only for acyclic and deterministic 
FDMs; 

xSAP supports analysis of arbitrary transition systems with cycles, given that it 
internally relies on the nuXmv model checker. However, it cannot enumerate 
FDs-minimal cut sets, but only subset-minimal ones. Note that for compu- 
tation of subset-minimal cut sets, xSAP is more general than our approach, 
as it supports general transition systems and arbitrary temporal properties. 
However, we use xSAP as a baseline to compare performance and scalability 
of our approach for cyclic PGFDS because it is a subcase of general tran- 
sition systems that is important in practice. In the evaluation, we use the 
IC3-based engine described in [6] (denoted as xSAP-IC3). Note that this 
algorithm assumes that the verified property is monotone and leverages this 
assumption for efficiency. 


T For example, let-expressions of form (let ((var definition) ...) body) in SMT- 
LIB. 
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For the comparison, we have created three sets of benchmarks: 


Scalable acyclic benchmarks consisting of linear structures extended by a 
triple modular redundancy scheme. The basic architecture of these structures 
is parameterized by its size n and the system contains 6n components: 3n 
modules and 3n voters. These benchmarks use the FDS W2F, which is a 
restriction of the FbS W3F of Fig. 1 to failure modes {w, fd, fu}, with the 
ordering w < fd < fu. 

Note that FDS-minimal cut sets of these benchmarks cannot be enumerated 
by xSAP, as the benchmarks use a non-Boolean FDS. 

Randomly generated systems with cycles over Boolean FDS which share 
some structural properties with real-world systems. In particular, we gen- 
erated random systems that have a similar distribution of in-degrees and 
out-degrees of the components as our proprietary systems, which we cannot 
disclose. We have generated 950 such systems of sizes ranging between 50 and 
1000 components. We have used the Boolean FDs for these benchmarks, so 
that they can be precisely analyzed also by xSAP. 

Note that these benchmarks cannot be solved by Emmy, as they contain cyclic 
dependencies among the components. 

Randomly generated systems over W2F which are created from the above- 
mentioned randomly generated systems by using the FDS W2F instead of 
the Boolean one. Although this does not change the overall structure of the 
system, it makes the transition relation more complicated and significantly 
increases number of minimal cut sets. 

In the evaluation, we only used systems of size at most 400, as both the 
compared approaches timed out on the vast majority of larger systems. 
Note that these benchmarks cannot be solved by Emmy, as they contain cyclic 
dependencies among the components. They can be solved by xSAP, but the 
generated cut sets are only subset-minimal with respect to fault variables, 
and not (in general) FDS-minimal. 


For the scalable benchmarks, we have generated encodings in the SMT for- 
mat described in this paper and in the FDS-ML format used by Emmy. For 
the randomly generated cyclic benchmarks, we have generated encodings in the 
SMT format and in the SMV format used by xSAP. The SMV encodings also 
include the assumption of fault-persistence. All the used benchmarks are subset- 
monotone, and therefore our SMT-based approach can be used to compute the 
set of minimal cut sets correctly. 

We have used wall time limit of 30min for each solver-benchmark pair. All 
experiments were performed on a Linux laptop with Intel Core i7-8665U CPU 
and 32 GiB of RAM. 

A comparison of SMT-PGFPS and Emmy on the scalable acyclic benchmarks 
can be seen in Table 2. It shows that Emmy times out already on systems of size 
5, i.e., on systems with 30 components. On the other hand, our approach is able 
to scale to systems with three thousand components. 

A comparison against the sequential approach of xSAP on cyclic benchmarks 
can be seen in Fig.5. Figures 5a and 5b show that on random systems over 
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Table 2. Numbers of minimal cut sets (#MC%S) and solving times for top level events 
failed detected (fd) and failed undetected (fu) on linear systems extended with triple- 
modular redundancy scheme. Note that the system with size n consists of 6n compo- 
nents (column #Comp). 


Failed detected Failed undetected 

Size #Comp #MCS Emmy (s) SMT (s) #MCS Emmy (s) SMT (s) 
1 6 4 0.051 0.001 7 0.071 0.001 

2 12 16 0.137 0.002 31 0.172 0.003 

3 18 28 3.052 0.004 55 3.141 0.007 

4 24 40 160.493 0.006 79 163.556 0.013 

5 30 52 >1800 0.009 103 >1800 0.017 
10 60 112 >1800 0.032 223 >1800 0.063 


100 600 1192 >1800 3.456 2383 >1800 6.748 
500 3000 5992 >1800 171.042 11983 >1800 328.737 


Boolean FDSss, our approach significantly outperforms the sequential approach 
of xSAP. As the size of the system grows, the difference can be up to several 
orders of magnitude. Both xSAP and SMT-PGFPS compute exactly the same 
minimal cut sets. Hence, the dramatic difference in performance can be justi- 
fied by the reduction to the combinational case, which prevents the unrolling 
of the transition relation by implicitly encoding the propagations in the total 
ordering(s) found by the SMT solver. 

The performance difference on the systems over the FDS W2F, shown in 
Figures 5c and 5d, is even more pronounced. This can be caused by two additional 
factors. First, the systems over the FDS W2F have more complicated transition 
relation, more minimal cut sets, and are in general harder. Thus, the unrolling 
performed by xSAP is even more costly. Second, xSAP has to enumerate more 
cut sets, because it is enumerating all subset-minimal cut sets and not only 
FDS-minimal cut sets. However, this cannot be the main source of the observed 
performance gap: on 35 from the 113 benchmarks on which both xSAP and 
SMT-PGFPS finished before timeout, the number of cut sets are the same; 
on the remaining 78 benchmarks, xSAP enumerates on average 6% more cut 
sets and at most 62% more cut sets. In order to obtain FDS-minimal cut sets 
from xSAP, the produced subset-minimal cut sets would have to be filtered 
or explicitly minimized, which would add yet another performance penalty for 
xSAP. 

Overall, the SMT-based techniques presented in this paper yield a fundamen- 
tal advancement with respect to the state of the art, both in terms of expres- 
siveness as well as in terms of performance. 
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Fig. 5. Comparison of SMT-PGFPS and xSAP-IC3 on random cyclic systems. 


8 Conclusions and Further Work 


We tackled the problem of supporting the Preliminary Safety Assessment phase 
of aircraft design. Specifically, we defined an expressive framework for modeling 
failure propagation over components with multiple levels of degradation, with 
nondeterminism and cyclic dependencies. We presented a sequential semantics 
and proved that the problem can be tackled by means of minimal models enu- 
meration in SMT. The framework is more expressive than the state of the art, 
and the proposed method outperforms the BDD-based techniques from [14] on 
acyclic benchmarks over generic FDSs, and the model checking techniques of [6] 
on cyclic benchmarks. 

In the future, we are going to introduce timing constraints and analyze redun- 
dancy architectures. We also investigate ways to relax the monotonicity and 
fault-persistence assumptions to explore recovery mechanisms and to further 
extend the reach of our approach. We are also working on encoding the causal- 
ity constraints in the frameworks of SAT modulo acyclicity [10] and ASP modulo 
acyclicity [4], which could improve the performance of our approach even further. 
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Abstract. Erroneous behavior of verification back ends such as SMT 
solvers require effective and efficient techniques to identify, locate and 
fix failures of any kind. Manual analysis of large real-world inputs usu- 
ally becomes infeasible due to the complex nature of these tools. Delta 
Debugging has emerged as a valuable technique to automatically reduce 
failure-inducing inputs while preserving the original erroneous behav- 
ior. We present ddSMT 2.0, the successor of the delta debugger ddSMT. 
ddSMT is the current de-facto standard delta debugger for the SMT- 
LIBv2 language. Our tool improves and extends core concepts of ddSMT 
and extends input language support to the entire family of SMT-LIBv2 
language dialects. In addition to its ddmin-based main minimization 
strategy, it implements an alternative, orthogonal strategy based on hier- 
archical input minimization. We combine both strategies into a hybrid 
strategy and show that ddSMT 2.0 significantly improves over ddSMT 
and other delta debugging tools for SMT-LIBv2 on real-world examples. 


1 Introduction 


In recent years, a growing number of formal methods applications (e.g., [6,8]) 
rely on Satisfiability Modulo Theories (SMT) solvers as the back end. Current 
state-of-the-art SMT solvers are typically complex pieces of software, and debug- 
ging erroneous behavior requires effective and efficient techniques to analyze 
failure-inducing input with the purpose of identifying and locating the cause 
of the failure. Manual analysis of real-world problems that trigger a particular 
unwanted behavior is very often infeasible for large inputs, mainly due to the 
complex nature of these tools. 

Erroneous behavior is never only triggered by a single unique input, but 
by a class of inputs that share a common trait. Extracting a minimal working 
example, i.e., an input that is as small as possible but still triggers the original 
faulty behavior, from such a class of inputs usually significantly decreases the 
time to identify and locate the cause of the failure. While ideally, the notion of 
size of an input directly correlates to the effort required to determine the failure 
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cause, in practice this is hard to quantify. We instead use metrics such as file 
size, number of language constructs, and solver runtime until the failure occurs. 

Finding such minimal working examples, however, is a problem of its own. 
Manual minimization is typically infeasible in practice, simply due to the large 
number of possible simplifications that may even depend on each other. Delta 
debugging techniques, on the other hand, provide automated means to minimize 
failure-inducing inputs. This typically entails to first read some input, apply a 
set of rules to simplify the input, and then check that the modified input still 
triggers the original behavior. Delta debugging in its simplest form [24] extracts 
a minimal working example by omitting parts of the input that are irrelevant 
for triggering the original faulty behavior. More input language specific tools 
perform additional simplifications to further minimize the input. All of these 
simplifications are typically performed until a fixed point is reached. 

For the design of a delta debugger, this process raises a number of ques- 
tions: How does the debugging tool check for “same behavior” of a tool on some 
input? Which simplification rules should be employed and how should they be 
combined? To what (syntactic and semantic) degree should the delta debugger 
itself understand the input language? In this paper, we address these questions in 
the context of delta debugging for the SMT-LIBv2 language and its dialects with 
our delta debugger ddSMT 2.0, the successor of ddSMT [18]. In the following, 
we will refer to ddSMT 2.0 as ddSMTv2, and to its predecessor as ddSMTv1. 


Related Work. Generic delta debugging tools that are agnostic to the input lan- 
guage can be surprisingly efficient for some use cases. For minimizing SMT-LIB 
input, however, their usefulness is usually rather modest. One such generic tool is 
linedd [4], which solely performs line-based simplifications. The first delta debug- 
ging tool specific to the SMT-LIB language was presented in [7] as deltaSMT 
and targeted SMT-LIBv1 [22]. Three years later, the SMT community adopted 
a new input language SMT-LIBv?2. In 2013, an updated version of deltaSMT [10] 
extended the tool syntactically for SMT-LIBv2 compliance, but limited to the 
feature set of the SMT-LIBv1 language and without full SMT-LIBv2 support. 
Note that this updated version is not available anymore. In the same year, 
ddsexpr [5], a generic hierarchical delta debugger for S-expressions (and thus 
applicable to the SMT-LIB language family), and ddSMTv1 [18], a delta debug- 
ger specific to the SMT-LIBv2 language, were presented. The latter implements a 
variant of Zeller’s ddmin algorithm [24] and is considered as the current de-facto 
standard delta debugger in the SMT community. The only other delta debugging 
tool specific to the SMT-LIBv2 language we are aware of is delta [15], a hierar- 
chical delta debugger shipped together with the SMT solver SMT-RAT [9]. A 
reimplementation of delta in Python is available as pyDelta at [14]. 


Contributions. In this paper, we present ddSMTv2, a delta debugging tool for 
the SMT-LIBv2 [2] language and its dialects. It supports the entirety of the 
SMT-LIBv2 standard as well as non-standardized extensions and derived for- 
mats such as the SyGuS input language [21]. Our tool is agnostic to future 
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extensions of the standard in the sense that it does not require any modifica- 
tions for basic support. It is easy to extend, and extensions will only be required 
for simplifications that are specific to new language features or a certain dialect 
of the SMT-LIBv2 language. In this sense it will also immediately support the 
SMT-LIBv3 [1] language, which is currently under development. 

ddSMTv2 is the successor of the delta debugger ddSMTv1 [18] and incorpo- 
rates, improves and extends its core concepts. It also implements an improved 
variant of the hierarchical approach of pyDelta as an alternative, orthogo- 
nal strategy, and allows to combine these two strategies in a hybrid manner. 
ddSMTv2 is intended to overcome major weaknesses of ddSMTv1, which is lim- 
ited to the SMT-LIBv2 language and does not support the full set of standardized 
background theories or language extensions to the point where it is even unable 
to parse the input file. ddSMTv2 further extends the set of theory-specific sim- 
plifications over both ddSMTv1 and pyDelta, which allows to exploit even more 
minimization opportunities. 

ddSMTv2 is implemented in Python and can be installed via pip3 install 
ddsmt. Its documentation is available at [11], and its source code is available 
under version 3 of the GNU General Public License (GPLv3) at [13]. 


2 Detecting Failure-Inducing Inputs 


An SMT solver is a fully automated tool to determine the satisfiability of a 
first order logic formula modulo some background theories and their combina- 
tions. For satisfiable inputs, SMT solvers optionally allow to query a model, 
whereas for unsatisfiable inputs, some optionally generate a proof of unsatis- 
fiability. Additionally, SMT solvers usually provide a plethora of configuration 
options. 

Within the SMT community, the notion of failure is generally defined as any- 
thing from abnormal termination or crashes (including segmentation faults and 
assertion failures), to performance regressions (one solver performs significantly 
worse on an input than a reference solver), unsoundness (answering sat instead 
of unsat and vice versa), incorrect models or incorrect proofs of unsatisfiabil- 
ity. In the following, we define a fatlure-inducing input to an SMT solver as an 
SMT-LIB input that triggers a failure. In particular, we do not consider options 
configured via command line as part of the input. 

Strategies to determine if a minimized input still triggers the original faulty 
behavior typically differ depending on the kind of the failure. For abnormal 
termination or crashes, it is usually sufficient to compare the exit code of the 
solver call, optionally with additional comparisons of output on the standard 
output and error channels. For failures that generate error messages that include 
memory addresses, it is often useful to not compare the full output, but to only 
match against a specific phrase that occurs in the original error output. 

By default, ddSMTv2 does exactly that: it determines if a simplified input 
has the same erroneous behavior as the original input by comparing the exit 
code and the output on the standard output and error channels for equality. 
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Standard output and error output can optionally be ignored or matched against 
user-defined strings via command line options. 

Performance Regressions are more tricky and typically involve helper scripts 
that call two solver configurations with some time limit and return a specific 
exit code in case the performance regression is triggered. The delta debugger will 
then minimize the input based on this exit code. Inputs that trigger unsoundness 
failures can be dealt with in a similar way. For inputs that reveal performance 
regressions and unsound answers, ddSMTv2 provides easy-to-use wrapper scripts 
that can also be adapted to more specific use cases. 

Incorrect models and incorrect proofs are more involved since they typically 
require some checking mechanism to determine if a generated model or proof 
is incorrect. Most SMT solvers implement such mechanisms and will throw an 
assertion failure in debug mode when such a failure is detected. For cases that 
are not detected by the solver itself, external checking tools are required. Imple- 
menting such checks is considered out of scope for a debugging tool due to their 
complex nature. 


3 Simplification Rules and Staged Simplification 


Historically, the set of simplification rules for delta debugging has been in general 
rather small and mainly limited to removing or reordering parts of the input. 
Adding structural and semantic simplifications on top of these basic transfor- 
mations has proved successful for the SMT-LIB language, and greatly improves 
performance over language agnostic minimization techniques. The delta debug- 
gers deltaSMT, delta and ddSMTv1 all support structural and semantic sim- 
plifications, albeit to a varying degree. Of these three, ddSMTv1 implements 
the largest set of language-specific simplifications. The SMT-LIB-agnostic delta 
debugger ddsexpr, on the other hand, performs structural simplifications only. 

Additionally, it is beneficial to devise a strategy for when to apply which kind 
of simplification rules to which part of the input in order to avoid generating 
useless test cases. An example for a useless test case is when the declaration of a 
constant is removed before removing all occurrences of this constant. Such a test 
case is useless because it is almost guaranteed to fail due to a parse error in the 
solver instead of triggering the original faulty behavior. It is further beneficial 
to perform simplifications that promise larger overall reduction (e.g., removal of 
commands) early on, in order to reduce the burden of more local, theory-specific 
simplifications (e.g., replacing terms with default values of the same sort). 

We require that applying a simplification rule indeed simplifies the input 
and that it is not possible to cycle between applications of simplification rules in 
order to ensure termination of the minimization procedure. Generally, we define 
simplification in terms of measuring the input size in bytes or in the number of S- 
expressions. We supplement this with specific syntactic and semantic properties, 
e.g., the number of variable binders in a quantified formula, or the degree of 
“sortedness” of children of an S-expression. Intuitively, we say that given an 
input A, a simplification rule yields a simpler input 6 if the constructs in B are 
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simpler according to some metric specific to the rule, or if B is smaller than A in 
terms of size. As an example for such a metric, consider a simplification rule that 
replaces a value with another value. Such a transformation is only interpreted as 
simpler if the value to be replaced does not already fall into the class of simpler 
values, e.g., for integer values we define the set of simpler values as {0,1}. Thus, 
replacing value 1234 with 0 is a simplification, but replacing 1 with 0 is not. 

In ddSMTv2, possible input simplifications are generated by so-called muta- 
tors, which implement simplification rules. They either perform small local 
changes to a given S-expression, or introduce global modifications on the input 
based on that S-expression. Each mutator implements a filter method, which 
checks if the mutator is applicable to the given S-expression. If this is the case, 
the mutator can be queried to suggest (a list of) possible local and global simplifi- 
cations. Mutators are not required to be equivalence or satisfiability preserving. 
They may extract semantic information from the input when needed, e.g., to 
infer the sort of a term, to query the set of declared or defined symbols, to 
extract indices of indexed operators, and more. ddSMTv?2 applies a considerably 
larger set of simplifications than ddSMTv1 and currently implements 48 muta- 
tors, which range from generic simplifications on S-expressions that require no 
understanding of SMT-LIB, to more theory-specific mutators that make full 
use of SMT-LIB semantics. Each of these mutators is enabled by default and 
can optionally be disabled. Extending ddSMTv2 with a new simplification boils 
down to implementing a filter method and methods to query local and/or global 
mutations in a new mutator class, and registering this class as an active mutator. 


4 Parsing and Input Representation 


While the question about the syntactic and semantic degree of understanding 
of the input language may seem silly at first glance, it is indeed warranted and 
actually crucial for the overall design of the delta debugger. The two extreme 
cases are aiming at full understanding of the language, and no understanding, 
i.e., treating the input as a sequence of bytes. The trade-off at hand is mainly 
between the ability to easily devise language compliant simplifications, and the 
burden of infrastructure required for parsing and representing the input, which 
is an additional burden on maintenance in case the input language changes. 
Both deltaSMT and ddSMTv1 aim at full understanding, while most of the 
others try for some intermediate level of abstraction, i.e., a level that does not 
require full understanding of the input language but allows for smarter sim- 
plifications than just manipulating bytes. The line-based delta debugger linedd 
minimizes input by removing lines, whereas ddsexpr is syntax-aware in the sense 
that it understands S-expressions, but without any SMT-LIBv2 specific seman- 
tics. Both delta and pyDelta extend understanding of S-expressions with some 
semantic properties, however, in the case of delta only to a very basic degree 
(it is, e.g., not even aware of sorts). Outside of the context of the SMT-LIBv2 
language, applying an intermediate abstraction approach was successful for the 
original ddmin algorithm [24], which considers change sets (e.g., commits or indi- 
vidual hunks of a commit), and in [23], where the authors use local semantics 
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of certain C++ constructs. Another example is presented in [16], which exploits 
the hierarchical structure of an input, independent of the concrete semantics. 

Our main target language is SMT-LIBv2, which is a hierarchically structured 
language where, to cite the SMT-LIBv2 standard [2], “every expression [...] is 
a legal S-expression of Common Lisp”. In contrast to ddSMTv1, in ddSMTv2 we 
aim for an intermediate level of abstraction to ease the burden on infrastructure 
and maintenance and choose to use S-expressions as the main representation of 
the input, just like ddsexpr does. However, additionally, we extract a comprehen- 
sive set of semantic properties to allow for SMT-LIBv2 specific and compliant 
simplifications. Language compliant transformations are a requirement for the 
specific use case of minimizing SMT-LIBv2 input to debug erroneous behavior 
of SMT solvers. This is mainly to avoid generating nonsensical test cases, i.e., 
test cases that an SMT solver will refuse to parse. Even when such test cases 
are refused immediately, if the overwhelming majority of generated test cases is 
nonsensical it can significantly impact the efficiency of our debugging tool. Note 
that we explicitly do not disallow delta debugging non-compliant input. 

ddSMTv2 features a simple S-expression parser and represents S-expressions 
as a lightweight wrapper around built-in Python tuples and strings. Seman- 
tic information is recovered in an ad-hoc manner after parsing. This allows for 
minimal infrastructure and maintenance overhead for input parsing and repre- 
sentation. The parser component of ddSMTv2 has less than 100 LOC, and the 
ad-hoc semantic analysis accounts for less than 400 LOC. Adding support for 
new versions, dialects or non-standardized extensions of the SMT-LIB language 
does not require any changes to the parser. 

This is in stark contrast to deltaSMT and ddSMTv1, which both aim to get 
a full understanding of the input, with all its negative consequences: deltaSMT 
dedicates about 50% (more than 2000 LOC) of its Java code base and ddSMTv1 
even over 80% (3000 LOC) of its Python code base to parsing and input repre- 
sentation. Note that the former targets SMT-LIBv1, whereas the latter provides 
full SMT-LIBv2 support for most of the standardized theories. In both tools, 
parsing is a disproportionate part of the code base and extending the tools to 
support new theories or language constructs usually requires extensive modifica- 
tions to their input parsers. These modifications have significantly complicated 
or even inhibited the development of these tools in the past: adding support 
for the theory of floating-point arithmetic in ddSMTv1 required touching more 
than 1000 LOC; deltaSMT, on the other hand, has never seen full support of 
SMT-LIBv2 and fails to parse almost all inputs from our test set. 


5 Delta Debugging Strategies 


Our delta debugger ddSMTv2 implements two minimization strategies which we 
call ddmin and hierarchical. These two can be combined into a third strategy 
called hybrid, which aims to utilize the best of both worlds. All three strategies 
use the same input representation and have access to the same pool of available 
mutators. However, they differ in how they apply mutators to simplify the input. 
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Algorithm 1: Main loop of ddmin strategy. 
Input: S-Expression input 
1 do // run to fixed point 
simplified := False 
for M € mutators do 
sexprs := {e | e € input A filteru (e)}, size := |sexprs| 
while size > 0 do 
for subset € partition(sexprs, size) do 
candidate := apply ,, (input, subset) 
if check_result(candidate) then 
| input := candidate, simplified := True 
sexprs := {e | e € input A filteru (e) }, size := size/2 


COoMANOA AOUN 


m 
© 


m 
ja 


while simplified 
return input 


m 
N 


Strategy ddmin. Our ddmin strategy implements a variant of the minimiza- 
tion strategy of ddSMTv1 and tries to perform simplifications on multiple S- 
expressions in the input in parallel. Algorithm 1 shows the main loop of this 
strategy. For each active mutator M, the algorithm first collects all S-expressions 
in the input that can be simplified by M (Line 4). Simplifications are applied 
and checked in a fashion similar to Zeller’s original ddmin algorithm [24]: the set 
of S-expressions sezprs is partitioned into subsets of size size; each S-expression 
e € subset is substituted in input (Line 7) with a simplification suggested by M; 
the resulting simplified input candidate is then checked if it still triggers the 
original behavior (Line 8). Once all subsets of a given size are checked, sexprs 
is updated based on the current input and partitioned into smaller subsets. As 
soon as all subsets of size 1 were checked, the algorithm repeats these steps with 
the next available mutator. The main loop of strategy ddmin is run until a fixed 
point is reached, i.e., the input cannot be further simplified. Strategy ddmin 
applies mutators in two stages. The first stage targets top-level S-expressions 
(e.g., specific kinds of SMT-LIB commands) until a fixed point to aggressively 
simplify the input before applying more expensive mutators in the second stage. 


Strategy hierarchical. The main loop of the hierarchical strategy performs a simple 
breadth-first traversal of the S-expressions in the input, and applies all enabled 
mutators to every S-expression, as shown in Algorithm 2. Once a simplification 
is found (Line 7), all pending checks for the current S-expression are aborted and 
the breadth-first traversal continues with the simplified S-expression sezpr (Line 
9). This process is repeated until a fixed point is reached, i.e., until no further 
simplifications are found for any S-expression. The main simplification loop (Line 
3) is applied multiple times, with varying sets of mutators. In the initial stages, 
strategy hierarchical aims for aggressive minimization using only a small set of 
selected mutators, in the next-to-last stage it employs all but a few mutators that 
usually only have cosmetic impact, and in the last stage it includes all mutators. 
We observed that breadth-first traversal yields significantly better results than 
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Algorithm 2: Core simplification loop of hierarchical strategy 
Input: S-Expression input 
do // run to fixed point 
simplified := False 
for sexpr € input do // BFS traversal 
for M € mutators do 
if sfilter);(sexpr) then continue 
for candidate € apply), (input, sexzpr) do 
if check_result(candidate) then 
input := candidate, simplified := True 
continue with simplified sexpr in Line 3 
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return input 
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a depth-first traversal, most probably since it tends to favor simplifications on 
larger subtrees of the input. 


Strategy hybrid. This strategy combines strategies ddmin and hierarchical in a 
sequential portfolio manner. It first applies ddmin until a fixed point is reached, 
and then calls strategy hierarchical on the simplified input. We chose this order 
of strategies after observing in our experiments that ddmin is usually faster in 
simplifying input, while hierarchical often yields smaller inputs. 


6 Experimental Evaluation 


We compare the different strategies implemented in ddSMTv2 against the exist- 
ing delta debuggers ddsexpr, ddSMTv1, delta, linedd, and pyDelta. For this pur- 
pose, we compiled a set of SMT-LIB and SyGuS test cases from different sources. 
Every test case consists of an input file, a solver binary and command line con- 
figuration options for that binary. Our set of test cases includes those used in [18] 
and instances reported in bug reports of the SMT solvers Bitwuzla [19], CVC4 [3], 
Yices [12], and Z3 [17]. The test cases from [18] include issues encountered with 
development versions of the SMT solvers Boolector [20] and CVC4. Note that 
we excluded 9 test cases from this set because they did not trigger any faulty 
behavior on our experimental setup. In total, we collected 244 test cases consist- 
ing of inputs that trigger assertion failures, unexpected behavior or wrong solver 
answers. We performed all experiments on a cluster with Intel Xeon E5-2620v4 
CPUs with 2.1GHz and 128GB memory and used a 1h wall-clock time limit 
and 8 GB of memory for each delta debugger/test case pair. Table 1 summarizes 
the results on all 244 test cases. 

A first immediate observation is the value of a simpler and more generic 
parser: ddSMTv1 fails to parse more than 20% of the inputs, mostly due to the 
lack of support for newer standard and non-standard SMT-LIBv2 constructs. 
Examples include the check-sat-assuming command, algebraic datatypes, 
some operators of the theory of strings, the SyGuS language extension, and 
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Table 1. Results summarized over all 244 test cases. 


ddsexpr | ddSMTv1 delta | linedd | pyDelta | ddmin hier. | hybrid 
Parse Errors 0 54 1 0 0 0 0 è] 
Incorrect Output 0 0 i 1 1 0) 0 0 
Timeouts 155 81 128 3 126 6 122 6 
Any Simplification 219 175 114 209 119 242 242 242 
Smallest Output 2 10 0 3 58 89 59 168 
Avg. Reduction (%) -40 -63 | +288 -26 -4 -75 | +571 -77 
Avg. Reduction w/o ERR (%) -40 -80 | +291 -26 -4 -75 | +571 -77 
Avg. Reduction w/o TO/ERR (%) -32 -73 | +617 -26 -57 -76 -59 -79 


the non-standardized extension to encode problems of separation logic. We also 
observe that each strategy of ddSMTv2 simplifies significantly more inputs than 
any other tool. The only inputs that could not be simplified by ddSMTv2 were 
already very small (83 and 98 bytes). Strategy hybrid achieves the smallest out- 
put on 168 test cases (more than two thirds) and an average reduction in file 
size by 77% (79% not counting timeouts), while only timing out on 6 test cases. 

Some debuggers increase the input size (in bytes), indicated by positive reduc- 
tions. Eliminating let binders or inlining function definitions frequently increase 
the size of the input. A positive reduction occurs if the debugger times out 
while performing such simplifications, or if it is unable to find viable simplifica- 
tions after the input size increased. In rare individual cases, incorrect outputs 
were produced that did not trigger the issue under investigation. This happened 
because of the unchecked removal of unused variables (delta), incorrect handling 
of timeouts (linedd) and defective handling of quoted symbols (pyDelta). 

The hybrid strategy performs significantly better than ddSMTv1, even on the 
set of instances that both can reduce without any timeout or error. On these 
commonly reduced instances (107), the results from hybrid are smaller in most 
cases (99), and on average smaller by about a third. 

On inputs that both ddmin and hybrid reduce without timeout or error (238), 
the hybrid strategy produces smaller outputs on 125 cases and never generates 
larger results. On average, over all 238 inputs the outputs are about 5% smaller. 
This may seem marginal, but can make a big difference for users in practice. 

Figures 1-2 show the direct comparison of ddmin, hierarchical, hybrid and 
ddSMTv1 in terms of output size and overall runtime as scatter plots, where a dot 
represents a test case and dots on the “T” lines correspond to timeouts. While 
strategy hierarchical tends to produce smaller output files, it is considerably 
slower than ddmin and runs into the time limit on 116 more test cases. As a 
result of this observation, we combined both strategies into the hybrid strategy, 
which first uses ddmin to quickly reduce the input before applying hierarchical to 
achieve maximum reduction. Comparing hybrid to the best of strategies ddmin 
and hierarchical, we see that hybrid usually achieves the smallest output and is 
only slower on test cases that are comparably fast to minimize. If the runtime 
of ddSMTv2 exceeds a few minutes, there is no discernible performance penalty. 
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Fig. 1. Output size (in % of original size). 
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Fig. 2. Overall runtime (in seconds). 


In comparison to ddSMTv1, strategy hybrid obtains significantly smaller out- 
put files on almost all inputs while having a similar runtime on inputs where 
ddSMTv1 terminates within the given time limit. 

All strategies allow to use multiple worker processes to perform checks asyn- 
chronously. Though there is potential for significant runtime improvements, the 
current impact is rather limited. With 8 worker processes, hierarchical achieves on 
average a 2x speedup, and up to 6x speedup on a few instances. Both ddmin and 
hybrid, on the other hand, slow down on average (by 25% and 9%, respectively). 


7 Conclusion 


We have presented ddSMTv2, a delta debugger for the SMT-LIBv2 language 
and its dialects. Our tool improves substantially over its predecessor ddSMTv1, 
which is the current de-facto standard in the SMT community for delta debug- 
ging SMT-LIB input. We have shown how a more generic parser approach not 
only lowers the maintenance overhead of the tool itself, but also makes the 
delta debugger more robust and easier to extend for future SMT-LIB extensions. 
Our experimental evaluation has shown that ddSMTv2 significantly outperforms 
existing delta debugging tools on a variety of real-world test cases from dif- 
ferent SMT solvers. Further, our experiments suggest that combining different 
minimization strategies is beneficial in practice to quickly obtain small output 
files. 
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Abstract. We study the problem of learning a finite union of inte- 
ger (axis-aligned) hypercubes over the d-dimensional integer lattice, i.e., 
whose edges are parallel to the coordinate axes. This is a natural gen- 
eralization of the classic problem in the computational learning theory 
of learning rectangles. We provide a learning algorithm with access to 
a minimally adequate teacher (i.e. membership and equivalence oracles) 
that solves this problem in polynomial-time, for any fixed dimension d. 
Over a non-fixed dimension, the problem subsumes the problem of learn- 
ing DNF boolean formulas, a central open problem in the field. We have 
also provided extensions to handle infinite hypercubes in the union, as 
well as showing how subset queries could improve the performance of 
the learning algorithm in practice. Our problem has a natural applica- 
tion to the problem of monadic decomposition of quantifier-free integer 
linear arithmetic formulas, which has been actively studied in recent 
years. In particular, a finite union of integer hypercubes correspond to 
a finite disjunction of monadic predicates over integer linear arithmetic 
(without modulo constraints). Our experiments suggest that our learning 
algorithms substantially outperform the existing algorithms. 


1 Introduction 


Suppose that we are interested in finding a formula y(%) over some theory T 
(e.g. integer linear arithmetic) to “capture” a certain phenomenon, which in 
verification could be, for instance, an invariant that a program satisfies some 
safety property. The process of discovering y can be captured by the notion 
of a learning algorithm by allowing certain types of queries as an interface to 
some teacher [3]. Most standard learning frameworks can be captured in this 
way. Here are some examples. Valiant’s well-known notion of PAC-learning can 
be captured by an oracle that returns a new random sample from an unknown 
distribution. Angluin’s well-known notion of exact learning [2,3] can be cap- 
tured by an interaction with the so-called minimally adequate teachers, which 
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can answer membership and equivalence queries. This has many applications in 
verification, e.g., verification of parameterized systems [10, 20,23] and composi- 
tional verification [9]. Another learning framework that has become very popular 
in verification is CEGIS (Counterexample Guided Inductive Synthesis) [21,27], 
wherein a learning algorithm can ask equivalence queries, but expect various 
types of “constraint-like” counterexamples (e.g. implication counterexamples) 
to be returned by the teacher. This is of course in contrast to Angluin’s exact 
learning setting, wherein the teacher may return only a positive/negative coun- 
terexample (a point in the symmetric difference of the target concept and the 
hypothesis). 

In this paper, we study the problem of learning sets of points over the d- 
dimensional integer lattice that can be expressed as a finite union of integer 
(axis-aligned, a.k.a. rectilinear) hypercubes, i.e., whose edges are parallel to the 
coordinate axes. Such a concept class of course forms a strict subclass of sets 
of points that are definable by a formula y(21,...,2q) in the integer linear 
arithmetic (a.k.a. semilinear sets), which have been addressed in several papers 
including [1,17,28], whose PAC-learnability is as hard as PAC-learning boolean 
formulas in DNF [16]—a long-standing open problem in learning theory—when 
binary representations are permitted (even over dimension one [1]). That said, 
finite unions of integer hypercubes are a concept class that naturally arises in 
computer science. Below we mention a few examples. 

The problem of learning rectangles (2-cube) and generalization to d-dimension 
are a classic example in computational learning theory, e.g., see [16,22]. Maass 
and Turán [22] showed for example that the d-dimensional rectilinear cubes can be 
learned in polynomial-time with O(log n) queries, where the corners of the cubes 
are represented in binary. The authors posed as an open problem if one can learn 
a union of two (possibly overlapping) rectangles with only O(log n) equivalence 
queries. Chen [11] showed that this can be learned with 2 equivalence queries and 
O(d. log n) membership queries. Later Chen and Ameur [12] showed that there is 
a polynomial-time algorithm using at most O(log? n) queries. The same paper left 
as an open problem if there is a polynomial-time exact learning algorithm that 
learns finite unions of rectilinear cubes over a fixed dimension d. In this paper, we 
answer this in the positive, and further show that this can be extended to allow infi- 
nite rectilinear hypercubes, which in turn allow interesting applications in formal 
verification, as we discuss below. 

Finite unions of rectilinear cubes arise naturally in program analysis and ver- 
ification. Here we mention two examples. First, solving games over a large game 
graph has benefited from constraint-based approaches, where winning regions 
can be succinctly represented and checked efficiently [6]. For example, the dis- 
cretization of the Cinderella-Stepmother problem [6] admits winning regions 
that may be represented by a union of a small number of cubes. Secondly, ver- 
ification algorithms benefit from optimization techniques like monadic decom- 
position [29], where the aim is the rewriting of a given quantifier-free SMT 
formula y(21,...,2n) into an equivalent boolean combination of monadic pred- 
icates u(x;) in some special form, i.e., typically in DNF [5,7,15,19], or by an 
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if-then-else formula [29], which could sometimes be exponentially smaller than 
the DNF equivalent representation. Veanes et al. [29] provided a generic semi- 
decision procedure for performing this monadic decomposition as an if-then-else 
formula, which works regardless of the base theory. The restriction of the prob- 
lem to the quantifier-free theory of integer linear arithmetic (with and without 
extra modulo constraints) was studied in [15], wherein the problem was shown to 
be coNP-complete and a monadic decomposition could be exponentially large in 
general. For the subcase without modulo constraints, a monadic decomposition 
in DNF corresponds precisely to a finite union of (possibly infinite) rectilinear 
hypercubes, which is the subject of this paper. We describe below how oracles 
for memberships and equivalence (as well as more powerful queries like sub- 
sets) admit a fast implementation via an SMT-solver, which enable our learning 
algorithms to be applied to compute such a monadic decomposition. 


Contributions. We study the problem of learning finite unions of rectilinear 
hypercubes (over Z?) in Angluin’s exact learning framework with membership 
and equivalence queries [2,3]. Our result is a polynomial-time exact learning 
algorithm for learning finite unions of rectilinear hypercubes over Z4 for fixed 
d. This answers an open problem of [12]. As observed in [12], over non-fixed d, 
this problem generalizes DNF since each term can be seen as a hypercube over 
{0,1}¢. That is, without fixing d, the problem is as hard as learning unrestricted 
DNF, which is well-known to be a major open problem in computational learning 
theory [4]. 

In view of applying our learning algorithm to the monadic decomposition 
problem [15,29] for quantifier-free integer linear arithmetic formulas, we con- 
sider two extensions. Firstly, we allow infinite hypercubes. For example, over 
1-dimension, these would include infinite intervals like [7,00), which would cor- 
respond to the formula x > 7. Secondly, we observe that the subset query (i.e. 
checking if the target concept includes a given finite union H of hypercubes) 
is not an expensive query for performing monadic decomposition, i.e., it would 
correspond to a single satisfiability check of a quantifier-free integer linear arith- 
metic formula, which can be handled easily by an SMT-solver. Subset queries 
belong to one of the standard types of queries in Angluin’s active learning frame- 
work, e.g., see [3]. For this reason, we provide an optimization of our learning 
algorithm by means of subset queries. 

We implemented these learning algorithms (vanilla and various optimization 
including subset queries and “unary/binary acceleration”), using Z3 [26] as the 
backend for answering equivalence and subset queries (each a satisfiability check 
of a quantifier-free formula). We have performed a micro-benchmarking to stress- 
test our algorithms against the generic monadic decomposition procedure of 
[29], which also use Z3 as the backend, using various geometric objects over 
Z? as benchmarks. Our experiments suggest that our algorithms substantially 
outperform the generic procedure. 


Organization. Preliminaries are in Sect. 2. We present the overshooting algorithm 
that witnesses polynomial learnability of finite unions of rectilinear cubes over 
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a fixed dimension d with membership and equivalence in Sect.3. In Sect. 4, we 
provide two extensions: (1) how subset queries could help speed up the over- 
shooting algorithm, (2) how the algorithm could be extended to handle infinite 
cubes. Applications to monadic decomposition and experiments are presented in 
Sect. 5. We conclude in Sect. 6. 

We refer the reader to the technical report [25] when proofs are omitted and 
to the artifact [24] for implementation and benchmark details. 


2 Preliminaries 


We introduce below some common mathematical notations: N and Z are the 
sets of natural numbers and integers, respectively. For a,b € Z, we write [a,b] = 
{i | a < i < b}; For any set X, we denote its power-set P(X) and its cardinal 
|X| € Nw {co}; Given two sets A, B, the symmetric difference is written AAB = 
A\BU B\A; 

When analyzing complexity of the presented algorithms, we assume binary 
encoding for any number n € Z, which is part of the input of the considered 
algorithms, namely, size(n) = 1+ [log(|n|+1)], where log is the base 2 logarithm. 


Hypercubes. For a fixed dimension d € N, we consider the discrete lattice Zt. A 
point v € Z can be described by its coordinates v[k] for k € [1,d]. Let v[k/a] 
denote the vector v where the i-th coordinate has been replaced by a € Z. The 
notation 0f = (0,...,0) € Z? denotes the origin, or simply 0 when the dimension 
is clear from context. We use standard notation for component-wise additions 
and scalar multiplication. In particular, for a € Z, v +a- v’ denotes the vector 
v” € Z? such that for all i, v” [i] = v[i] + a- v’[i]. For 1 < i < d, we write e; for 
the i-th elementary vector, e; = O[i/1]. We shall be mostly using the standard 
component-wise order < over vectors in Z4: v < v’ iff for all i, vfi] < v'[i]. 
We finally denote the size of a vector as the sum of the sizes of its components: 
size(v) = som size(v|i]), for any v € Z£. 

Our main study focuses on rectilinear hypercubes (cubes for short), i.e., any 
set of points of the form C = {v | v < v < v} for some v < V € Zê. The 
size of C is uniquely defined as size(C’) = size(v) + size(v). On the contrary, an 
arbitrary finite set X has no unique representation as a finite union of cubes, 
therefore we define its size as the size of its best representation: 


n n 
size(( X) = min > size(v,) + size(V;) | 3n, v4 -.. Vn : X = U cxnete.¥)} 
i=1 i=1 
We adopt here a worst-case analysis approach, where our later reasoning and 
complexity analysis are valid for any representation, they are in particular valid 
for its best representation. 
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Learning Model. We first recall some standard definition from computational 
learning theory; for more, see [16]. Fix a countable base set D = Ui'_, Di, where 
the sets D;’s are pairwise disjoint. The problem of learning boolean formulas in 
DNF uses D; = {0,1}", i.e., the set of all binary sequences of length i, which can 
be thought of as a set of all assignments to a boolean function over £1,..., £i. 
The learning problem in this paper uses D; = Zt. A concept X is simply a 
subset of D;, for some i € Zyo. For example, when D; = {0,1}', a concept is 
simply a boolean function over x71,...,2;. When we speak of a learning problem, 
we always have a fixed set of representations in mind. For example, when we 
speak of learning boolean formulas in DNF (Disjunctive Normal Form), the 
representation yx of a boolean function X has to be a formula over x1,...,%; in 
DNF. For example, X could be a boolean function, whereas yx a DNF formula 
representing X. Note that a concept could admit many possible representations. 
A concept class C = (J72; Ci is a set of concepts, where C; C P(D;). For example, 
Ci could be the set of boolean functions over variables 71,...,2;. When the set of 
representations for C is fixed (e.g. DNF for representing boolean functions), we 
could define size( X) of the concept X to be the size of the smallest representation 
of X. In this paper, we are dealing with the concept class Ca C P(Z%) of sets of 
integer points that can be represented as a finite union of rectilinear hypercubes 
over Zt. Earlier in this section we have defined this concept, as well as the size of 
the representation. To avoid notational clutter, we will often denote the concept 
class Ca by C because our algorithm typically assumes that d is fixed. 

In Angluin’s active learning framework [2,3], the learner has access to oracles 
(a.k.a. teachers) that could provide hints about the target concept X to the 
learner. A minimally adequate teacher must be able to answer membership and 
equivalence queries. 


Definition 1 (M+EQ Oracles). Consider some target concept X € Ca for 
some concept class C = (JJ; Ca and let L,T ¢ D be two fresh symbols. 


- A membership oracle (M) for X is a function Bx : Da —> {T, L}, which 
outputs T iff v eX. 

- An equivalence oracle (EQ) for X is a function Yy : Ca > Da {T} such 
that for all hypothesis H € C, Wx(H) € (HAX) W 4{T} and ¥x(H) = T 
implies H = X. 


Intuitively, an equivalence oracle tells, for any hypothesis H € C, whether H = 
X. If yes, T is returned; if not, it provides a counterexample, namely a point in 
the symmetric difference. Angluin has considered other types of queries as well 
in her framework including subset/superset queries and difference queries (e.g. 
see her excellent survey [3]). We will use the subset queries in Sect. 4. 

A learning algorithm A is said to learn the concept class C = (J3; Ca if, given 
d as input and any unknown target concept X, it terminates and outputs a rep- 
resentation of X after a finite amount of interaction with the oracles. Assuming 
that the oracle always returns the shortest counterexamples, its running time 
is defined to be number of steps (measured in d and size(X)) that A takes to 
output a representation of X. The complexity comp(d,size(X)) of A measures 
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the number of steps taken in the worst case for all d and size(X). It runs in poly- 
nomial time if comp is a polynomial function. It remains a long-standing open 
problem in computational learning theory if there is a learning algorithm for 
boolean formulas represented in DNF, which is true for almost all major models 
including exact learning and PAC (see [4]). Over geometric concepts including 
hypercubes and semilinear sets, the dimension d is sometimes considered a fixed 
parameter, e.g., see [1, 12,17, 22]. 


3 Minimally Adequate Teacher 


We restrict first our attention to the minimally adequate teacher setting where 
only a membership and equivalence oracle are provided, and provide construc- 
tions for intermediate procedures that can be interpreted as oracles. 


3.1 Corner Oracle 
At the heart of our learning algorithm is the concept of corners: 


Definition 2. Given a set of points X C Z*, a maximal corner (resp min- 
imal corner) of X is a point v € X maximal (resp minimal) with respect 
to component-wise ordering <. We write Corners(X) and Corners(X) for the 
sets of maximal and minimal corners, respectively, and write Corners(X) = 
Corners(X ) U Corners(X). 


Given a membership oracle for some X € C containing 0, Algorithm 1 returns 
some maximal corner of a given finite subset. Intuitively, for each coordinate i, 
a binary search is made until a border of X is eventually found. More precisely, 
we provide the following complexity analysis. 


Algorithm 1. Binary search for a maximal corner, assuming 0 € X 
Ensure: Returned value is a maximal corner of X 
Require: 0 € X; x a membership oracle for X 
function FINDMAXCOoRNER(®x) 
i— 0; v=0 
while i < d do 
i—i+l; ke1; lel; 
if dx(v+e;) then 
while x (v +k-e;) do 
l-k; k—2k 
while k — l > 1 do 
if x(v + |(k +1)/2] - ei) then 
L [(k +1)/2| 
else 
k — [(k+1)/2| 


vevel-e; i0 
return v 
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Proposition 1. Let x be a membership oracle for X = UP_,Cube(v,, Vi) 
and assume O € X. Then FINDMAXCORNER(®x) terminates after 


O oe size(¥;)) queries and returns some V € Corners(X). 


This algorithm provides a partial implementation of the following oracle: 


Definition 3. Given X € C, a corner oracle for X is any function Ox : X > 
Corners(X) x Corners(X). 


A complete implementation of this oracle is provided by noticing that mem- 
bership oracles can easily be composed: 


Remark 1. Assume ®, and pg are two given membership oracles, respectively 
for two arbitrary sets A and B, and f : Z4 — Z4. One can build membership 
oracles for AU B, AN B, AAB, A\B and f(A). In particular: 


- By instantiating f : v +> —v, the previous procedure applied on @ (4) returns 
some v € Corners(—A), so —v € Corners(A). 

- For any vo € A and f : v > v — Vo, sqa) is a membership oracle for 
A-—vo = {v | v+vo € A} containing 0, so FINDMAXCORNER(@¢,,4)) returns 
some v € Corners(A — vo) so v + vo € Corners(A). 


In both cases, notice that size(f(A)) < size(A) + size(vo) < 2size(A). 


In the sequel we write Pc for the membership oracle of any set C obtained by 
composing sets whose oracles are provided. We also assume having constructed 
the two procedures FINDMAXCORNER(v, x) and FINDMINCORNER(v, x). 


3.2 Overshooting Algorithm 


Algorithm 2 Overshooting algorithms 


Require: x membership oracle for X, Wx equivalence oracle for X 
function LEARNCUBES(®x ,Wx ) 


function REFINEADDREMOVE(H, v, &x) 


a ee if x (v) then 
vee v — FINDMINCORNER(v, ©x\ g) 


V — FINDMAXCORNER(v, #x\ 77) 
return H U Cube(v, V) 

else 
v — FINDMINCORNER(v, #p\ x) 
V — FINDMAXCORNER(v, px) 
return H\Cube(v, V) 


H — REFINE(H, v, 8x) 
until v= T 
function REFINESYM(H, v, ®x) 
v — FINDMINCORNER(v, xag) 
V — FINDMAXCORNER(v, xan) 
return H ACube(v, V) 


The core loop of the learning algorithm is presented in the LEARNCUBES func- 
tion of Algorithm 2. The hypothesis is initially empty, and is later refined, as 
long as a counterexample is returned. How to refine the hypothesis given a coun- 
terexample? Two implementations of REFINE are provided namely REFINESYM 
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and REFINEADDREMOVE, giving rise to two variants of the algorithm. In both 
cases, the refinement takes a counterexample as an input and uses the corner 
oracle to build a cube C. In the former variant, a symmetric difference between 
the current hypothesis and C is made, while in the latter, C is either added or 
removed from the hypothesis. 


(a) Step 1: add (b) Step 2: remove (c) Step 3: remove 


e counterexample v E minimal corner v E maximal corner 0 
search space =.’ hypothesis 1 learned cube to add $ learned cube to remove 


Fig. 1. Possible run of the overshooting algorithm on two cubes in 2 dimensions 


An example run of the REFINEADDREMOVE variant is depicted in Fig. 1. 
While the above diagrams represent the search space used by the corner oracles, 
the below diagrams depict the resulting hypothesis after refinement. Initially, 
the hypothesis is empty (not represented) so the search space coincides with the 
target set X, which can be represented as a union of two overlapping cubes. 
A counterexample v € X\H is therefore returned by the equivalence oracle. 
As v € X, the refinement procedure adds some cube by searching the state 
space X\H = X around v. A too large cube is then added to the hypothesis, 
and a negative counterexample v € H\X is then returned. The search space 
is now H\X and the algorithm aims at removing some smaller cube from the 
hypothesis. After two removals, the final hypothesis coincides with the target. 


Hypothesis Representation. Both variants are operating on the hypothesis by 
applying boolean operations. One can naturally wonder if hypothesis represented 
by union, symmetric differences and differences of cubes can be handled by 
oracles operating on the concept class of finite cubes. As a matter of fact, we 
will observe that HAX, H\X and X\H can all be represented in C: 


Lemma 1 (Cube intersection and subtraction). Let Cı = Cube(v,,¥1) 
and C2 = Cube(v5, V2) two cubes. 

Then Cy NC is a cube and C2\C\ can be written as the disjoint union of 2d 
cubes. Moreover, these computations are effective in 2d operations. 


Intuitively, one can think of a cube subtracted by a smaller cube results in a 
family of cubes, one for each face of the larger cube. There are 2d faces for a 
cube in dimension d. 


Learning Union of Integer Hypercubes with Queries 251 


3.3 Repetition-Free Complexity 


In order to analyze the complexity of both variants of the algorithm, we fix a 
finite target set X € Ca and one of its representation as a union of cubes: 


X= U Cube(v,, Vi) 


i= 


We prove by induction on the iteration step that H can be expressed as a 
union of cubes, whose corners are aligned on a particular set of points: 


Definition 4 (Abstract grid). For 1<k<d, we define the sets: 


B, = {vi(k]} +1 | 1<i< Chu {v,[k] | 1<i< C} 


By, = {vilk] | 1 <i < C}U {y,[k] -1] 1 <i <C} 


For any A C Z4, we write A € B whenever is a finite union of cubes of the form 
Cube(v, v’) such that for all k, v[k] € By, and v'[k] € Br. 


Intuitively, B, (resp Bk) describes all the possible k-coordinate for minimal 
corners (resp maximal). A coordinate for a max corner, i.e. a constraint of the 
form x, < a, can become a coordinate for a minimal corner, i.e. a constraint of 
the form £k > a+1, when taking the complement during a difference operation, 
and vice versa. 

We observe that B is stable by union, intersection and difference. In particu- 
lar, the overshooting algorithms maintain H € B, namely the hypothesis always 
has minimal (resp maximal) corners that align with B, (resp Bẹ) on the k-th 
coordinate. Figure 2 provides an example of such points for a target made of the 
union of two cubes. 


a" a 


a" n" 


Fig. 2. Possible minimal and maximal corners for cubes appearing in the hypothesis, 
for a given target space 


Since the sets B, and B, are of size at most 2n for every k, there are at 
most (2n)?? possible cubes, polynomial for a fixed d. Assuming H € B, we can 
ensure that Lemma 1 maintains a polynomial representation of the hypothesis 
throughout the algorithm until termination. 

Although B is of polynomial size, proving H € B is not sufficient to prove 
termination of the algorithm in polynomial time, especially if some cubes in B 
are added and removed several times. Consider for example Fig. 3 which depicts a 
possible run of the algorithm on three aligned cubes by its successive hypotheses: 
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A 0 m 
F E 
e 
a i e 
B E’ k a 
E ee | Be 
C E 


Fig. 3. Possible run on three cubes where cube B is added twice to the hypothesis. 


cube B is added during the first step, but is later covered when the algorithm tries 
to learn A but overshoots. Another overshooting happens when trying to remove 
the space between A and B, which ends up removing all space between A and 
C. The cube C has then to be learned a second time, terminating the algorithm. 

To circumvent this issue, we propose an optimization that prevents visit- 
ing twice the same minimal corner v. We base our reasoning on the following 
observations: 


— If v € X, then v € X, so v should not be later removed. 
— If v ¢ X, then v ¢ X, so v should not be later added back to H. 


Algorithm 3 introduces an optimized refinement procedure to keep track of 
the already added maximal corners. Although an analogous optimization can 
be done on the symmetric difference variant, we only discuss here REFINEAD- 
DREMOVE2. 

Once a minimal corner v for a candidate cube has been found, we continue 
the search of a maximal corner V by avoiding points that will result in the 
removal (resp addition) of already added (resp removed) minimal corners. 


Algorithm 3. Optimized refinement avoiding visited minimal corners 
Let V — Ø 
function REFINEADDREMOVE2(H, ve, ®x) 
if x(v.) then 
Let v = FINDMINCORNER(ve, ®x\ g) 
Let V = FINDMAXCORNER(vy, ®x\H\{v | 3v’eV:v<v'<v}) 
V-Vw {v} 
return H U Cube(y, V) 
else 
Let v = FINDMINCORNER(ve, Pp x) 
Let V = FINDMAXCORNER(V, ®77\x\{v | av’eViv<v/<v}) 
V-Vuw {v} 
return H\Cube(v, V) 


Notice how only the maximal corner search benefits from the optimization, 
by tracking down minimal corners only. As a matter of fact, one could store 
the whole visited cubes in set V. However, when a search for maximal corner is 
carried, the resulting cube will intersect a previously visited cube as soon as the 
max corner crosses the minimal corner of the visited cube. 
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We exploit again Remark 1 to build an oracle for every mentioned member- 
ship oracle. Since V is a finite set, one can indeed build a membership oracle 
for the set {v | dv’ € V\X : v < v’ < v}. Due to this exclusion region, a finer 
analysis has to be conducted to prove H € B. 


Lemma 2. The two optimized variants maintain the following invariants: 


LVAX CH: 

2. Vix) =b; 

3. forallv € V, and any k, v|k] € Bg; 
4. HEB. 


Properties 1 and 2 ensure that every v added to V is never added twice. 
These also ensures correctness of the algorithm: remark that the search for a 
maximal corner is not started from the initial counterexample ve but from v, 
which is indeed is in the search space since v ¢ {v | dv’ € V : v < v’ < v} (no 
point added twice to V). Finally, property 3 ensures that only elements of (B;,) x 
are added to V, hence a maximal number of (2n)? additions. 


Proof. At the beginning of the algorithm, V = H = Q, satisfying all given 
properties. We prove the result by induction on the iteration step: 


1. By definition of corner oracles, namely FINDMAXCORNER, if v € X has been 
added to V during some previous iteration, it was added in the first branch 
(the oracle returns some point in the search region, which excludes X in 
the second branch). Therefore, it was also added to H during this iteration. 

Consider some later iteration removing elements from H, namely an iteration 

executing the second branch. Some cube C = Cube(v’, v’) has been computed 

by the corner oracles in this branch such that v’ € H\X\{u | dv’ EV iv’ < 

v’ < v} In particular, since v € V, we do not have v’ < v < V' hence v ¢ C 

and v is not removed. 

Similar to (1) (symmetric case). 

3. For every v added to V, it was produced by a (max) corner query made 
on X\H or H\X. Both of these sets are in B since H € B by induction 
hypothesis. 

4. Let us prove that the cube C = Cube(v,V) currently added or removed 
satisfies C € B (hence H U C, H\C € B which will conclude the induction). 
We already have proven that v € (Bp). We prove now that V € (Bx) which 
is searched over the restricted state space B = A\{v | 3v'EV:v <v <v} 
for A= X\H € Bor A= H\XB. 

For any k € [1,d], V + ex ¢ B so either: 

—~vVte,¢A€Bso Vk] € By; 

- or V +e E€ {v | dv’ CV iv < v’ < v} but since V is not in the set, 
there exists v’ € V such that V[k] + 1 = v’[k]. Since v’[k] € Bp, we have 
Vk] E By. 

This concludes the proof. 


nN 


By combining Proposition 1 and Lemma 2, we summarize the complexity of 
our overshooting algorithms for a particular target X = U?_, Cube(v,, Vi) € Ca. 
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Theorem 1 (M+EQ). Both variants of LEARNCUBES terminates in at most 
(2n)? iterations, where an iteration requires: 


1. One equivalence query; 
2. One corner query, or equivalently, a linear number O(size(X)) of membership 
queries. 


This algorithm terminates in polynomial time, for fixed d, in any represen- 
tation of target X. In particular, the result holds in the worst-case where the 
representation of X as a finite union of cubes is minimal. As a matter of fact 
the presented exponential bound in d is tight: there exists a target X € C and 
a pair of corner and equivalence oracles such that both algorithms terminate in 
exponential time. 


(a) Overshooting (b) Remove plane xı =2 (c) remove plane x2 = 2 


T 


T 


(d) Remove cube (e) Remove cube 


Fig. 4. exponential blow-up, case d = 2 


Example 1. Consider X = {0, Da 2e;} composed of two cubes, then by learn- 
ing Cube(0, DE 2e;), then removing every middle plane of equation x, = 1 for 
every k € [1, d], the resulting hypothesis is composed of 2% — 2 cubes to remove. 
An example with d = 2 is depicted in Fig. 4. 


Whether finite unions of cubes can be learned in polynomial time in the dimen- 
sion is left as an open problem, that we relate to DNF formula learning over 
d variables where each term can be interpreted as a cube over {0,1}¢. 


4 Extensions 


In this section we introduce extensions to the overshooting algorithm from 
Sect. 3.2. While membership and equivalence queries are sufficient for learning 
finite sets, one natural extension of the minimal learner setting is to introduce 
a subset oracle [3]: 


Definition 5 (Subset Oracle). Consider some target concept X € Ca for some 
concept class C =|)7~_, Ca and let L,T ¢ D be two fresh symbols. 

A subset oracle (SUB) for X is a function px : Ca —> {T, L}, which outputs 
T offH CX. 
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The definition is similar to the membership oracle from Definition 1 except the 
oracle takes a set instead of a single point as input. 


4.1 Maximal Cube Oracle 


As opposed to the overshooting algorithm, using a subset oracle avoids the over- 
shooting issue, that is to say, we can now search for cubes included in the tar- 
get X. In order to increase the convergence speed, we nonetheless introduce a 
maximality criterion on the suitable cubes: 


Definition 6 (Maximal Cubes). A cube Cube(v,¥) is maximal w.r.t. X if 


1. Cube(v, Vv) C X 
2. For all i, Cube(v, V + e;) Z X 
3. For all i, Cube(v — e;,¥) Z X 


Figure 5 provides examples of possible maximal cubes in dimension d = 2. 


(a) 4 maximal cubes when n = 2 (b) n(n + 1)/2 maximal cubes 


Fig. 5. Example of maximal cubes w.r.t. to a union of n cubes 


Next, we modify the corner oracle from Sect. 3.1 to use subset queries. Again, 
we only define the algorithm to find a max corner, the min corner algorithm can 
be implemented analogously. The algorithm first computes a lower and upper 
bound for the subsequent binary search. The computation is shown in the func- 
tion COMPUTEMAXBOUNDS. Given a cube defined by its minimal and a maximal 
corner, the value of coordinate 7 is increased as long as the resulting cube is still 
a subset of the target set X. The upper bound V is the first negative reply by 
the oracle and the lower bound v the last positive response. A binary search is 
made on v and V in the FINDMAXINCCORNER function. 


4.2 Maximal Cube Algorithm 


Algorithm 5 presents a procedure that iteratively refines the hypothesis: for any 
point, the algorithm searches for a maximal cube contained by this point w.r.t. 
the target and adds it to the hypothesis. One can check that both procedure 
calls are valid, as H C X is an invariant. At every iteration the counterexample 
v satisfies v € X \ H. The use of the subset oracle ensures that the function 
FINDMAXINCCORNER always returns a point V such that Cube(v,v) C X. 
Similarly, the function FINDMININCCORNER always returns a corner v such that 
Cube(v,v) C X. The resulting cube is then added to the hypothesis, ensuring 
point v is never visited again as a counterexample. This entails the termination 
of the algorithm, in at most |X| iteration of the main loop. A better bound will 
be explored in Sect. 4.4. 
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Algorithm 4. Maximal corner of a maximal cube, in O(size(X)) subset queries 


Ensure: Returned value is a maximal corner of X 
function FINDMAXINCCORNER(y, V, px) 
for i € [1,d] do 
(b,b) = COMPUTEMAXBOUNDS(v, V, i, px) 
while b 4 b do 
m+ (b+6)+2 
if px (Cube(v, v[i/m])) then 
bem 
else 
bem 
return v 
function COMPUTEMAXBOUNDS(v, V, i, px) 
ô- 1 
while px(Cube(v, Vv + 6-e;)) do 
ô 2.8 
return (V[i] + 6/2, V[i] + 8) 


Algorithm 5. The maximal cube algorithm 
function LEARNMAXCUBE(px, Yx) 
Let H~9@ 
while (v — Wx(H)) # T do 
Let V — FINDMAXINCCORNER(V, v, px) 
Let v — FINDMININCCORNER(V, V, px) 
H — H UCube(y, V) 


4.3 Extension to the Infinite Case 


We discuss now one possible extension to the infinite case, namely when cubes 
are possibly unbounded and may contain infinitely many points. 

We adapt our learning formalism to deal with infinite bounds: for the remain- 
der of the section we extend the discrete lattice Z4 to (Z W {+00, —oo})4 
and extend trivially < over the newly introduced points. For v,v € (Z W 
{+00, —oo})¢, the definition of C = Cube(v, V) remains unchanged, in particu- 
lar C C Z? but may be infinite. The concept class C, hence the domain of oracle 
functions, is augmented with all finite unions of cubes with (possibly) infinite 
bounds. 

A possible approach to tackle this problem in the minimally adequate teacher 
(M+EQ) formalism consists in running the overshooting algorithm of Sect. 3 on 
the state space restricted to some cube of width 2% centered in 0 and gradually 
increase k if counterexamples outside this restriction are found. This method is 
discussed in the extended version of the present article [25] but we focus here on 
a LEARNMAXCUBE adaptation exploiting subset queries (SUB+EQ). 

While Algorithm 5 remains unchanged, we need however to adjust the func- 
tions FINDMAXINCCORNER and FINDMININCCORNER as those are not able 
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to accelerate the search to infinity. Algorithm 6 achieves this goal by simply 
overriding the COMPUTEMAXBOUNDS and COMPUTEMINBOUNDsS subroutines 
in order to check for possible +00 and —oo bounds. Whenever such bound is 
returned, no further binary search occurs for this coordinate (constant time). 


Algorithm 6. Maximal bound overriding, checking for +00. 
function COMPUTEMAXBOUNDS(vy, V, i, px) 
if px (Cube(vy, v[i/ + oo])) then 
return (+00, +00) 
else > We refer to original COMPUTEMAXBouNDs of Algorithm 4 
return SUPER.COMPUTEMAXBOUNDS(vy, V, i, px) 


4.4 Complexity 


Termination of LEARNMAXCUBE was proved using cardinality arguments 
in Sect. 4.1. These arguments obviously don’t apply in the case where the target 
set is infinite. Moreover, we are interested in finer complexity analysis. 

As in Sect.3.3, we fix a target representation X = U?_,Cube(v,,¥;) and 
study the algorithm complexity with respect to )>;__, size(v;) + size(V;) € Ca. 
As some of the vectors v may contain infinite coordinates, we carefully specify 
size(-++oo) = size(—oo) = 1 and keep the usual definition of size(v). 


Theorem 2 (SUB+EQ). LEARNMAXCUBE terminates in at most n4 iter- 
ations, where an iteration requires: 


1. One equivalence query; 
2. One maximal cube query, or equivalently, a linear number O(size(X)) of sub- 
set queries. 


Proof. At every iteration, one equivalence query is performed then FINDMAx- 
INCCORNER and FINDMININCCORNER perform a binary search, resulting in a 
linear number of subset similar (proof similar to Proposition 1). 

In order to analyze the number of iterations of the main loop, let us first 
remark that each added maximal cube is added only once: if we write vz the 
k-th counterexample and Ck the learned maximal cube, then vz41 € X\U*, C; 
and Vk+1 E€ Ck+1 80 Ck+1 Æ Ci for every i € [1, k]. 

The number of iterations is therefore bounded by the number of max- 
imal cubes. We proceed now to bound the number of maximal cubes: Let 
C = Cube(v,V) be a maximal cube w.r.t. X. For any k € [1,d] there exist 
i,j € [L,n] such that v[k] = v,[k] and v[k] = v;[k], hence at most n? possibili- 
ties for coordinate k. 


As in Theorem 1 the number of iterations is polynomial in the number of 
cubes n but exponential in the dimension d. As opposed to the LEARNCUBES 
algorithm, the bound is not tight as the example Fig. 5b provides only a quadratic 
number of maximal number of cubes. As the maximal cube concept can be 
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related to the notion of prime implicant, examples of DNF formula with an 
exponential of prime implicants (see for example [8]) can be translated into 
union of cubes with an exponential number of maximal 0-1 cubes. 

From a practical perspective, one can nonetheless argue that LEARNMAX- 
CUBE is likely to perform well in practice, by avoiding the overshooting problem 
mentioned in Example 1 as H C X is an invariant. In fact, one can easily check 
that if there are no adjacent! cubes, the number of iterations becomes linear. 


5 Applications and Experiments 


In this section, we describe an immediate application of our learning algorithms 
to monadic decomposition of quantifier-free Presburger formulas [15,29]. We 
then report on experimental comparisons between our algorithms and existing 
methods for the problem. 


5.1 Application to Monadic Decomposition 


Here we consider quantifier-free linear integer arithmetic formulas without mod- 
ulo arithmetic: 
p= ~a | png] eve, 

where ~ € {<, >, =}, and aj, a2 are integer linear combinations of the variables 
T1,- , Zn, 1.e., a; is of the form cot sey Cj.£j, where each c; € Z. The formula 
p(T) is said to be satisfiable (written (Z; +) H y) if there exists an assignment 
o of £ to Z such that the formula becomes true. Of course, this is just a simple 
fragment of the first-order theory of integer linear arithmetic and the notion 
of (Z;+) = ọ can be defined in the same way even with quantifiers [14,18]. 
A formula ¢ is said to be monadic if it has only one variable. Every monadic 
formula y(x) in this fragment can be easily transformed into a union integer 
intervals of the form: (1) 1 < «Aa < u where l, u € Z, (2) | < x where l € Z, (3) 
x < u where u € Z, or (4) T or L. 

A monadic decomposition |29] of a formula y(%) is a boolean combination 
p(T) of monadic formulas that is equivalent to y over the theory, i.e., (Z; +) H 
Vz(p = w). Of course, not all formulas admit a monadic decomposition (e.g., £ = 
y). It was shown in [15] that deciding if a formula in the theory be monadically 
decomposable is coNP-complete?. Veanes et al. [29] provides a generic semi- 
decision procedure for computing a monadic decomposition of a quantifier-free 
formula as an if-then-else formula that is applicable to pretty much all theories 
considered in SMT. Despite its genericity, the procedure runs rather well, e.g., 
as the authors showed on their benchmarking in [29]. 


1 Two cubes C4 and C2 are adjacent if min 5; Ivi [i] — v2 [i]| | vı € Ci, v2 € C2} <1, 

? The proof in [15] uses modulo constraints to show that monadic decomposition of a 
two-variable formula v(x, y) is coNP-complete. Modulo constraints could be easily 
removed by allowing more integer variables. 
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The application of our learning algorithms to computing monadic decom- 
position arises from the following observation. Since each monadic decompo- 
sition can be transformed into DNF, a monadic decomposition of a formula 
p(T) over (Z;+) can be constructed as a finite union of (possibly infinite) 
hypercubes, where an infinite hypercube arises when a variable is either not 
bounded from above or not bounded from below (or both). Conversely, a finite 
union H of possibly infinite hypercubes can also be easily transformed into 
a boolean combination of monadic formulas yy. For example, the formula 
(0<a<5A3< y < 10) V (8 < x) corresponds to the union of hypercubes 
Cube((0, 3), (5, 10)) UU Cube((8, —0o), (+00, +00)). Furthermore, all relevant ora- 
cles admit a straightforward implementation: 


— A membership query @ requires checking (Z; +) = y(t), which can be checked 
in polynomial-time because ¢ is quantifier-free. 
— An equivalence query H can be reduced to checking 


(Z; +) F (pH Ame) V (pA 7H). 


This is a single satisfiability check of quantifier-free integer-linear arithmetic 
formula, for which highly-optimized solvers exist (e.g., Z3 [26]). 
— A subset query H can similarly be reduced to checking 


(Z; +) E (pu ^p). 
This is also a single satisfiability check over (Z; +). 


This allows us to apply both of our learning algorithms to the problem. 
Monadic decomposition has numerous applications including quantifier elimi- 
nation [29], string solving [15], and symbolic finite automata/transducers [13,29], 
among others. In the following example we illustrate how our learning algo- 
rithm(s) could be applied to improving quantifier elimination for the theory of 
linear integer arithmetic. 


Example 2. Consider a formula of the form VzJy y(Z, 7), where ¢ is a formula in 
linear integer arithmetic without modulo constraints. Suppose that p is monad- 
ically decomposable, and is equivalent to the formula V; ; Dj(Z, y), where each 

D; is a disjunction of monadic predicates over the variables ZUy. We assume 
w.Lo.g. that each D; is satisfiable. Then, this formula is equisatisfiable (over 
linear integer arithmetic) to Y := Vz (V; Di(Z, G)) , where y in D; are replaced 
by fresh constants G; (i.e. two distinct D;, D; use different constants). This can 
be proven by a simple application of skolemization, and observing that each 
occurrence of f(z) in any disjunct is of the form a < f(z) < b, where a € 
{—oo} UZ and b € ZU {oo}, implying that f(z) can be replaced by a single 
constant, which does not depend on &. Finally, let D; be the conjuncts in D; only 
involving variables in z. Checking that w is true reduces to checking satisfiability 
of Ni `D}. 

To make this example concrete, we consider the formula Vaiy(x > 0 > 
x+y >5Ay > 0). A monadic decomposition of the quantifier-free part is 
x<O0V V2_o(x >i ^y È 5-— i). Therefore, checking the above formula can be 
reduced to satisfiability of x > 0A No x < i which is not satisfiable. 
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5.2 Experiments 


In order to assess the performance of the algorithms FINDMAXCORNER and 
FINDMINCORNER respectively introduced in Sect.3 and Sect. 4, we consider 
prototype implementations. The following prototypes and experiments can be 
found in [24]. 


Variants. Although the methods were presented with binary search strategies 
in mind, we also implemented a more naive unary search procedure to obtain 
the corners. As later noticed in the experiments, unary search may be preferred 
for very small cubes and performs especially well for cubes which are based 0- 
1 integer programs, while binary search achieves better performance for larger 
cubes. Consequently, we refer to a third variation of the algorithm called “opti- 
mized”, combining unary search for small instances and binary search for large 
values. More precisely two variants of the overshooting algorithm from Sect. 3 
and three variants of the max cubes algorithm from Sect. 4 are presented, called 
respectively overshoot_unary and overshoot_binary and maz_unary, max_binary 
and maz_optimized. 


Tool Comparison. Evaluation is performed against a generic monadic decom- 
position procedure mondec, from [29] by Veanes et al., which works over an 
arbitrary base theory and outputs an if-then-else formula, which could be expo- 
nentially more succinct than a formula in DNF. The algorithm, which exploits 
the python-Z3 framework [26], uses a kind of a decision tree search heuristics to 
split the input into monadic predicates. 


Implementation. Similarly to mondec,, our prototype is implemented in python 
using the python-Z3 framework, but is specialized in handling linear integer 
arithmetic formula, and that outputted formulas will be in DNF, unlike mondec,. 
For monadic decomposition applications, oracles queries are converted to appro- 
priate Z3 satisfaction queries since a (possibly non-monadic) representation of 
the target set is already known. 


(a) 50 overlapping cubes and the diagonal (b) 100 big Cubes. 
x+y =50. 


Fig. 6. Benchmarks for Z?. 
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5.3 Benchmark Suite 


Our benchmark suite is restricted to the problem of monadic decomposition of 
linear integer arithmetic, and its purpose is to stress-test our learning algorithms 
and mondec, against various kinds of “extreme conditions”. The suite consists 
of six classes of monadically decomposable example formulas, which were con- 
structed to test five features (see below). Note that the given formulas themselves 
might contain non-monadic predicates. 

The five features (left to Table 1. Features of conducted benchmarks. A “+” 
right in Table 1) represent the (resp. “-”) indicates a high (resp. low) presence of a 
presence of (1) a large amount feature. 
of cube overlaps, (2) a large 
number of cubes, (3) a large Overläp 
cube, (4) large dimension, and _(@) 
(5) an unbounded cube. We (b) = 
hypothesized that these five (©) 
features play important roles (9) 
in how fast the algorithms (e) 
perform, which are indeed (®) 
validated in our experimental 
results. The six classes of formulas are elaborated below. 


# Cubes | |Cube| | Dimension | Unbounded 


(a) K Diagonal Restricted consists of K overlapping cubes of length and width 
2 and one diagonal as shown in Fig. 6a. The cubes overlap with at most two 
other cubes and stack up diagonally. The algorithms need to return all the 
cubes left of the diagonal. 

(b) 10 cubes in Z? consists of K = 10 overlapping cubes of size 2% stacking up 
diagonally similar to the benchmark K Diagonal Restricted without diagonal 
restriction. 

(c) K Diagonal Unrestricted is a variation of Fig. 6a where the algorithms need 
to return all the cubes and all the points on the diagonal. 

(d) K Big Overlapping Cube is a benchmark testing large cubes as depicted 
in Fig. 6b. It consists of K overlapping cubes of length and width 100 and 
are overlapping and stacking up diagonally like the benchmark K Diagonal 
Restricted. 

(e) K Diagonal is built as the set of points along the diagonal x = y < K. 

(£) Example 2 is generalized to any K EN by x >O0O>a+y>KAy> 0. Its 
unbounded nature makes it tractable by max_optimized and mondec, only. 


5.4 Results 


Experiments were conducted on an AMD Ryzen 5 1600 Six-Core CPU with 
16 GB of RAM running on Windows 10. The results are summarized in Fig. 7 
where each graph represents one benchmark comparing the run times of each 
algorithm. 
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(e) Benchmark on K Diagonal in Z?. (£) Benchmark on Example 2 in Z?. 
The x-axis encodes the maximal value The x-axis encodes parameter K. 
for x and y. 


— overshooting-unary — overshooting_binary — maxcube_unary 
— maxcube_binary —— maxcube-_optimized — mondecı 


Fig. 7. Benchmark results. The y-axis encodes the time in seconds. The timeout is set 
to1800s. 
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The overshooting phenomenon can be observed in Fig. 7c and Fig. 7e with 
its quadratic shape, as d = 2. In Fig. 7b, the running time quickly diverges as d 
increases, as anticipated by Example 1. 

When the considered cubes are small, as in Fig. 7a and Fig. 7c, the unary 
search algorithms outperform their binary counterparts, meaning the few addi- 
tional queries made by the binary search are more costly than a direct enumer- 
ation. The optimized variant is therefore a good compromise in all cases. 

Figure 7d depicts a benchmark with many large cubes for a fixed dimension. 
While the impact of the overshooting phenomenon remains contained, the max- 
cube unary search variant is particularly slow. This can be explained by the 
size of the cubes making unary search inefficient, combined with the already 
expensive cost of every single inclusion query. 

The mondec, algorithm is comparable to the overshooting algorithms in 
Fig. 7e. It also performs particularly well in Fig. 7f, which we conjecture is due 
to the conciseness of the solution in if-then-else form used by mondecy. 

Overall, the maxcube algorithm in its optimized form is the most stable 
algorithm for this benchmark set and should be preferred when an inclusion 
oracle is available. The extra cost of these queries are here taken into account 
and remain affordable when implemented with Z3 queries. 


6 Conclusion and Future Work 


We have presented a polynomial-time algorithm in Angluin’s exact learning 
framework using membership and equivalence for learning a finite union of rec- 
tilinear cubes over Z? over any fixed dimension d. By considering an additional 
subset oracle, learning possibly infinite cubes can be achieved with the same com- 
plexity, but a simpler and faster learning algorithm in practice. The technique 
enables the introduction of auxiliary oracles, namely the corner (resp. maximal 
cube) oracle when a membership (resp. subset) oracle is provided. While ora- 
cles for subset queries tend to be difficult to implement, this turns out not to 
be the case for our proposed application of computing monadic decompositions 
of quantifier-free integer linear arithmetic formulas without modulo constraints, 
which is successfully solved by our algorithm. 

We mention three future research directions. First, extensions to modulo 
operations could be explored, by encoding periodicity on d additional coordi- 
nates and providing adequate oracles on the encoded target. A second direc- 
tion consists in applying these learning techniques to the verification of systems 
by learning invariants which are monadically decomposable in a small num- 
ber of cubes. Lastly, one promising direction to further improve our algorithms 
is to investigate how to leverage if-then-else formula representations as used in 
mondec, [29], which could be exponentially more succinct than formulas in DNF. 
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Abstract. We present a new model-based interpolation procedure for 
satisfiability modulo theories (SMT). The procedure uses a new mode of 
interaction with the SMT solver that we call solving modulo a model. This 
either extends a given partial model into a full model for a set of assertions 
or returns an explanation (a model interpolant) when no solution exists. 
This mode of interaction fits well into the model-constructing satisfiability 
(MCSAT) framework of SMT. We use it to develop an interpolation pro- 
cedure for any MCSAT-supported theory. In particular, this method leads 
to an effective interpolation procedure for nonlinear real arithmetic. We 
evaluate the new procedure by integrating it into a model checker and com- 
paring it with state-of-art model-checking tools for nonlinear arithmetic. 


Keywords: Satisfiability modulo theories - Craig interpolation - 
Nonlinear arithmetic 


1 Introduction 


Craig interpolation is one of the central reasoning tools in modern verification 
algorithms. Verification techniques such as model checking rely on Craig inter- 
polation [11,39] as a symbolic learning oracle that drives abstraction refinement 
and invariant inference. Interpolation has been studied for many fragments of 
first-order logic that are useful in practice, such as linear arithmetic [23], unin- 
terpreted functions [9,37], arrays [25,38], and sets [32]. In these fragments, a 
typical interpolation procedure constructs interpolants by traversing the clausal 
proof of unsatisfiability provided by an SMT solver [26,34,41] while performing 
interpolation locally at proof nodes. A major missing piece in the class of frag- 
ments supported by interpolating SMT solvers is nonlinear arithmetic,! as the 


1 By nonlinear arithmetic we mean Boolean combination of arithmetic constraints over 
arbitrary-degree polynomials. 
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complex reasoning required for nonlinear arithmetic makes fine-grained symbolic 
proof generation extremely difficult. 

We present an approach to interpolation that is driven by models rather 
than proofs. Given a pair of formulas A and B such that A A^ B is unsatisfiable, 
an interpolant is a formula J that is implied by A and inconsistent with B. 
Recent model-based decision procedures, specifically the ones developed within 
the MCSAT [13,28] framework for SMT, are internally naturally interpolating. 
But, rather than interpolating two formulas, they provide a way to interpolate a 
set of constraints against a partial model. We capitalize on this internal ability, 
and extend it so that a formula A can be checked and interpolated against 
a partial model (model interpolation). This is closely related to the ability of 
modern SAT solvers to perform solving modulo assumptions [17], a technique 
that can also been used to provide interpolation capabilities in finite-state model 
checking [3]. 

We take advantage of model interpolation to build a formula-interpolation 
procedure through a simple idea: we can compute an interpolant of formulas 
A and B by iteratively interpolating (and refuting) all models of B with model 
interpolants from A. We develop the interpolation procedure within the MCSAT 
framework. This immediately allows us to generate interpolants for any theory 
supported by the framework. As MCSAT provides efficient complete solvers for 
nonlinear real arithmetic [27,29], we develop the first complete interpolation 
procedure for real nonlinear arithmetic. 

To show that this new interpolation procedure is an effective tool that can 
be used on real-world problems, we integrate it into a model checker that uses 
interpolation for inferring k-inductive invariants. We evaluate this model checker 
on a set of industrial benchmarks. Our evaluation shows that the new procedure 
is highly effective, both in terms of speed, and the ability to support the model 
checker in its quest for counter-examples and invariants. 


Outline. Section 2 gives background on SMT, interpolation, and nonlinear arith- 
metic. Section 3 presents solving modulo a model and model interpolation, and 
develops the general interpolation procedure. In Sect. 4, we discuss the particu- 
lar needs of nonlinear arithmetic. In Sect.5 we evaluate our implementation on 
nonlinear model-checking problems. We conclude in Sect.6 and provide future 
research directions. 


2 Background 


We assume that the reader is familiar with the usual notions and terminology 
of first-order logic and model theory (for an introduction see, e.g., [1]). 


Nonlinear Arithmetic. As usual, we denote the ring of integers with Z and the 
field of real numbers with R. Given a vector of variables æ we denote the set 
of polynomials with integer coefficients and variables x as Z[a]. A polynomial 
f € Zly, x] is of the form 
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dm— d 
1 +- +a: £” +a, 


F(Y, £) = am 2°" +am-1 `T 
where 0 < dı < --- < dm, and the coefficients a; are polynomials in Z[y] with 
am # 0. We call x the top variable and the highest power dm is the degree of 
the polynomial f. As usual, we denote with f™ the k-th derivative of f in its 
top variable. A number a € R is a root of the polynomial f € Z[a] if f(a) = 0. 

A polynomial constraint C is a constraint of the form f V 0 where f is a poly- 
nomial and V € {<,<,=,>,>}. If the polynomial f = f(x) is univariate then 
we also say that C is univariate. An atom is either a polynomial constraint or a 
Boolean variable, and formulas are defined inductively with the usual Boolean 
connectives (A, V, =). The symbols T and L denote true and false, respectively. 
In addition to the basic polynomial constraints, we will also be working with 
extended polynomial constraints. An extended polynomial constraint F is of the 
form x V, root(f,k,x) where f € Z[y,z] and Vr € {<;,<,,=r,>r, >r}. The 
semantics of this predicate is the following: Given an assignment that gives real 
values v to the variables y, then the roots of f(a, x) can be ordered over R. If 
the polynomial f(a, x) has at least k real roots and a, is the k-th smallest root? 
then the constraint is equivalent to x V ax. Otherwise, the constraint evaluates 
to L. For example, the constraint x < root(«? — 2,2, a) represents x < v2. 

Given a formula F(x) we say that a type-consistent variable assignment 
M = {a > a} satisfies F if the formula F evaluates to T in the standard 
semantics of Booleans and reals. We call M a model of F and denote this with 
M FE F. If there is such a variable assignment, we say that F is satisfiable, 
otherwise it is unsatisfiable. If two models Mı and Mə agree on the values of 
their common variables, we denote the model that combines Mı and Mə with 
Mı U Mə. 


Definition 1 (Craig interpolant). Given two formulas A(x, y) and B(y, z) 
such that A^ B is unsatisfiable, a Craig interpolant is a formula I(y) such that 
A= I and I = =B. We call the pair (A, B) an interpolation problem. 


Model Checking. A state-transition system is a pair G = (I, T}, where I(x) isa 
state formula describing the initial states and T (æ, x’) is a state-transition for- 
mula describing the system’s evolution. Given a state formula P (the property), 
we want to determine whether all reachable states of G satisfy P. If this is the 
case, P is an invariant of G. If P is not invariant, there is a concrete trace of 
the system, called a counter-example, that reaches =P. 

The direct way to prove that a property P is an invariant of G is to show 
that it is inductive. This requires showing that P holds in the initial states: 
I = P, and that it is preserved by transitions: P(x) A T(x,x’) => P(a’). As 
most invariants are not inductive, a key problem in model checking is to find am 
inductive strengthening of P, that is, a property P’ such that P’ > P and P’ is 
inductive. 


? For example, x” — 2 has two roots. The first root — v2 is the smallest of the two and 
the second root is V2. 
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Example 1 (Cauchy-Schwarz inequality). We can frame the Cauchy—Schwarz 
inequality as a model-checking problem in nonlinear arithmetic. The inequality 


is the following 
(So ain)? < Ol 27) vi) (1) 
i=1 i=1 i=1 


As shown in [21], many inequalities that involve a discrete parameter (such as 
n above) can be converted to model-checking problems. For inequality (1), we 
construct the transition system ecs = (I, T} where 


2 (S3 = 0), 
=(= Sı + £y) ^ (Sh = S2 +£ Vik ($5 = S3+y 2); 


The variables S1, S2, $3 correspond to the sums in (1) in order. The two variables 
x and y of Ges model the variables x; and y; from (1) in each iteration of Ges. 
Proving the inequality amounts to showing that property P., = (S? < S293) 
is an invariant of Ges. Property Pes is not inductive on its own, but property 
P! = Pes A (S2 > 0) A (S3 > 0) is an inductive strengthening of P... 


cs >= 


Many modern model-checking techniques, specifically those based on SMT 
solving, use interpolation as a tool to automatically infer inductive invariants. In 
this context, an interpolant can be used to over-approximate a transition in the 
context of a spurious counter-example. In addition to interpolation, the recent 
class of techniques broadly termed property-directed reachability (PDR) (e.g., 
[24,30,33]), relies on model generalization, which converts a concrete counter- 
example state into a set of counter-examples. 


Definition 2 (Generalization). Given a formula F(x,y) such that F is true 
in a model M, we call a formula G(x) a generalization of M if G(a) is true in 
M and G(x) > Jy . F(x,y). 


A PDR model-checking procedure for nonlinear arithmetic requires both an 
interpolation and a generalization procedure. 


3 SMT Modulo Models and Interpolation 


SMT solvers typically provide an API to assert formulas and to check the sat- 
isfiability of asserted formulas. We denote with SOLVER::ASSERT(F) the solver 
method that adds the formula F to the set of assertions to be checked by the 
solver. We denote with SOLVER::CHECK() the solver method for checking satis- 
fiability, with the following contract. 


SOLVER::CHECK(): Check satisfiability of asserted formulas A and 


1. if there is a model M such that M F A, return (sat, M}; 
2. otherwise return (unsat, Ø). 
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In this contract, the solver does not return any form of inconsistency cer- 
tificate when the assertions are unsatisfiable.? We generalize the standard SMT 
satisfiability checking to SMT modulo models as follows. 


SOLVER::CHECK(Mo): Check satisfiability of asserted formulas A and 


1. if there is a model M > Mp such that M E A, return (sat, M, T}; 
2. otherwise return (unsat,@, I) where A > I and Mo F ~I. 


SMT modulo models allows one to check that a formula is satisfiable modulo 
a partial model Mo, by seeking a solution that extends Mo. If there is no such 
solution, the formula J returned as the certificate of unsatisfiability is a model 
interpolant: it is implied by the assertions and inconsistent with Mp (ie, I 
evaluates to L in the model Mo). If we restrict ourselves to Boolean formulas, 
SMT modulo models reduces exactly to solving modulo assumptions [17] used 
in the SAT community. Although this idea is not completely new, it is the first 
time that it is used for interpolation in SMT, as far as we know. 


3.1 Interpolation 


Before diving into an approach that can support the above mode of satisfiability 
checking, we first show how model interpolation can be used to devise a general 
interpolation method. 


Algorithm 1: INTERPOLATE(A, B) 


1 S4.assert(A) ; 

2 Sp.assert(B) ; 

3I- T; 

4 while true do 

5 (rp, Mp) — S'p.check() ; 
6 if rg = unsat then 

7 | return (unsat, /) 

8 (ra, Ma, Ia) — Sa.check(Mp) ; 
9 if ra = sat then 

10 | return (sat, Ma U Mp) 
11 I IA; 

12 Sp.assert (1,4) 


Algorithm 1 shows the pseudocode of a procedure that checks satisfiability 
and interpolates two formulas A and B. The basic idea is simple: we enumerate 


3 Some solvers support proof generation. While proofs are fundamentally important, 
we are interested in certificates that can always be computed and are useful in 
supporting further analysis. For example, proof generation for nonlinear arithmetic 
is still a hard open problem. 
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models M;, of the formula B, and refute each model M; with a model interpolant 
Ip from A. If the process converges and returns unsat, we collect the model 
interpolants and construct the final interpolant J = A I;,. Each interpolant Iy is 
implied by A because it is a model interpolant, so A = J. Each model of B is 
refuted by some model interpolant J;, and so J = ~B. On the other hand, if the 
process returns sat, the procedure has found a common model for A and B. The 
procedure above is model-driven and modular, in that it checks the formulas 
A and B independently while only communicating models (from B to A) and 
model interpolants (from A to B). 


Lemma 1 (Correctness). Jf INTERPOLATE(A, B) returns (unsat, I) then AAB 
is unsatisfiable and I is an interpolant for (A, B). If INTERPOLATE(A, B) returns 
(sat, M) then AA B is satisfiable and M is a model of both A and B. 


Note that Lemma 1 does not claim termination of the procedure. Termination 
depends on the ability of model interpolation to produce a finite number of 
model interpolants that can eliminate a potentially infinite number of models. 
A naive approach to check a formula A(x, y) for satisfiability modulo a model 
Mo = {y |> v} is to use an interpolating SMT solver. First, encode the model 
into a formula Fm = A (yi = vi). If the formula A A Fy is satisfiable in a model 
M,sois A and M D Mp. Otherwise, we compute the interpolant I of A and Fm. 
This naive approach satisfies the requirements of SOLVER::CHECK(M)), but it 
is limited for the following reasons. First, theories such as nonlinear arithmetic 
have complex models and the formula Fm can be hard to express. As an example, 
x ++ y2 can only be expressed by extending the constraint language to support 
algebraic numbers, or by using additional assertions such as (x? = 2) A (x > 0). 
More important, traditional interpolation provides no guarantees in terms of 
convergence of a sequence of interpolation problems. For example, as already 
noted in [42], =F), would be a valid interpolant for A and Fm. But such an 
interpolant only eliminates a single model and could, in general, lead to non- 
termination of INTERPOLATE(A, B). To tackle this issue, we require that the 
procedure SOLVER::CHECK() produces interpolants general enough to disallow 
such infinite sequences of model interpolants. We do this by adopting the con- 
vergence approach and terminology of [42] to model interpolation as follows. 


Definition 3 (Model Interpolation Sequence). Given a formula A(x, y), 
a sequence of models (Mp) of y, and a sequences of formulas (Ip) over y, we 
call (Ip) a model interpolation sequence for A and (Mp) if for all k it holds that 


1. Mp is consistent with \,-, Ii; 


2. My is inconsistent with A; 
3. Ix is a model interpolant between A and Mpk. 


Definition 4 (Finite Convergence). We say that SOLVER::CHECK() has the 
finite convergence property if it does not allow infinite model interpolation 
sequences. 


Lemma 2 (Termination). Jf SOLVER::CHECK() has the finite convergence 
property, then INTERPOLATE(A, B) always terminates. 
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3.2 SMT Modulo Models with MCSAT 


We build a procedure for solving SMT modulo models by modifying the satis- 
fiability checking procedure of MCSAT. The MCSAT method for SMT solving 
was introduced in [13,28] and further extended in [27]. We give a brief overview 
of the MCSAT terminology and mechanics, and we describe the satisfiability 
procedure. We emphasize modifications to the original MCSAT procedure that 
are needed for solving SMT modulo models. 

The architecture of an MCSAT solver consists of a core solver, an assignment 
trail, and reasoning plugins. The core solver drives the overall solving process, 
and is responsible for dispatching notifications and handling requests from the 
plugins. The solver trail is a chronological record that tracks assignments of 
terms to values. It is shared by the core solver and the reasoning plugins. The 
reasoning plugins are modules dedicated to handling specific theory terms and 
constraints (e.g., clauses for Booleans, polynomial constraints for arithmetic). A 
plugin reasons about the content of the solver trail with respect to the set of 
currently relevant terms. In the context of nonlinear arithmetic problems, the 
reasoning plugins are the arithmetic plugin and the Boolean plugin. The most 
important role of the core solver is to perform conflict analysis when one of the 
reasoning plugins detects a conflicting state. 

When formulas F\,...,F, are asserted, by calling SOLVER::ASSERT(F;), the 
core solver notifies all plugins of the asserted formulas. The plugins analyze the 
formulas and report all relevant terms back to the core. The relevant terms 
are the variables and subterms of the formulas F;s that need to be consistently 
assigned to ensure a satisfying assignment. In nonlinear arithmetic, relevant 
terms are all variables, arithmetic constraints, and non-negated Boolean terms 
that appear in the input formula (or are part of a learnt clause). Once the 
relevant terms are collected, the core solver adds the assertions to the trail. The 
initial trail contains then the partial assignment F; ~» T and the search for a 
full satisfying assignment starts from this trail. 


Solver Trail and Evaluation. The assignment trail is the central data structure 
in the MCSAT framework. It is a generalization of the Boolean assignment trail 
used in modern CDCL SAT solvers. The trail records a partial (and potentially 
inconsistent) model that assigns values to relevant terms. If the satisfiability 
algorithm terminates with a sat answer, the full satisfying assignment can be 
read off the trail. At any point during the search, the trail can be used to evaluate 
any relevant compound term based on the values of its sub-terms. A term t (and 
at, if Boolean) can be evaluated in the trail M if t itself is assigned in M, or if all 
closest relevant sub-terms of t are assigned in M (and its value can therefore be 
computed). As the search progresses, it is possible for some terms to be evaluated 
in two different ways, which can result in a conflict (i.e., a term assigned different 
values). In order to account for this ambiguity, we define an evaluation predicate 
evaluates|M](t, v) that returns true if the term ¢ can evaluate to the value v in 
trail M. 
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Algorithm 2: MCSAT::CHECK(2 > v) 


Data: solver trail M, relevant variables/terms to assign in queue 
1 while true do 

2 unitPropagate() ; 

3 if a plugin detected a conflict and the conflict clause is C then 
4 (C, final) — analyzeConflict(M, C, x) ; 

5 if final then 

6 I — analyzeFinal(M, C) ; 

7 return (unsat, /) 

8 else backtrackWith(M, C) ; 

9 


else 
10 if exists x; E€ x unassigned in M then 
11 | ownerOf (x;).decideValue(x;, vi) 
12 else 
13 if queue.empty() then return (sat, M) ; 
14 x <— queue.pop() ; 
15 if x is unassigned then ownerOf(x).decideValue(zx) ; 


Conflicts and Conflict Clauses. One of the main responsibilities of reasoning 
plugins is to ensure that the trail is consistent at any point in the search. A 
trail is evaluation consistent if no relevant term can evaluate to two different 
values, as described above. A trail is unit consistent if every relevant term can 
be given a value without making the trail evaluation inconsistent. If the trail is 
not evaluation consistent or unit consistent, the trail is in conflict. 

Trail consistency is a generalization of the consistency that CDCL SAT 
solvers enforce during their search. By unit propagation, a SAT solver ensures 
that, if no conflict has been detected, no clause can be falsified by assigning a 
single variable (i.e., no clause evaluates to both T and L). In the MCSAT frame- 
work, the plugins do the same: they keep track of unit constraints and reason 
about the consistency of the trail. It is the responsibility of the plugin to report 
conflicts. Each conflict must be accompanied with a valid conflict clause that 
explains the inconsistency.* A clause C = (L1 V...V Ln) is a conflict clause in 
a trail M, if each literal L; can evaluate to L in M, ice. if evaluates[M](L;, L). 


Example 2. Consider the constraint C = (x? + y? < 1) with the set of relevant 
terms {C, x,y}, and the following solver trails 

M,=[CreT,c#H 0], M:=|C=T,x=0,y=>0], 

M; =|C=—T,z=>1], Miı=[|C=T,xz=>1,y>0]. 


The trails Mı and Mg are consistent, the trail M3 is unit inconsistent (no consis- 
tent assignment for y exists), and My, is evaluation inconsistent (C evaluates to 
both T and L). A valid explanations for the inconsistency of M3 is the conflict 


4 By valid here we mean that the clause is a universally true statement on its own. 
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clause C3 = =C V (x < 1), while a valid explanation for the inconsistency of M4 
is the conflict clause C4 = ~C VC. Although Cy is a tautology, it is an acceptable 
conflict clause since both literals can evaluate to L (because evaluates|M4](C, T) 
and evaluates| M4] (C, L)). 


Main Procedure. The implementation of the satisfiability checking procedure 
SOLVER::CHECK() is a generalization of the search-and-resolve loop of modern 
SAT solvers (see, e.g. [16,17]). The procedure is shown in Algorithm 2, where 
we emphasize the extensions needed for SMT modulo models in red. The overall 
procedure performs a direct search for a satisfying assignment and terminates 
either by finding an assignment that extends the given partial model, or deduces 
that the problem is unsatisfiable as certified by an appropriate model interpolant. 

The main elements of the procedure are unit propagation and decisions, 
used for constructing the assignment, and conflict analysis for repairing the 
trail when it becomes inconsistent. The unitPropagate() procedure invokes 
the propagation procedures provided by the plugins. Propagation allows each 
plugin to add new assignments to the top of the trail. If, during propagation, a 
plugin detects an inconsistency, it reports the conflict to the core solver along 
with a valid conflict clause. The decideValue(xz) procedure assigns a value of 
the given unassigned term x. Decisions are performed only after propagation 
has fully saturated with no reported conflicts, which means that the trail is unit 
consistent. In such a trail, an assignment for x is guaranteed to exist, but the 
choice of a particular value is delegated to the plugin responsible for x (e.g., the 
arithmetic plugin for real-typed terms). 


Modification 1 (Decisions). To support SMT modulo a model x + v, 
variables x; € x of the input model are decided before any other term, and 
are assigned the provided value vi. The procedure that performs this decision 
is denoted with decideValue(2;, vi). If a decision introduces an evaluation 
inconsistency, the plugin reports the conflict with a conflict clause. 


Detecting and explaining decision conflicts is straightforward: there must exist a 
single constraint C that can evaluate to both T and L in the trail. Such conflicts 
can always be explained with a clause of the form (~C V C). 

If a conflict is reported, either during propagation or in a decision, the pro- 
cedure invokes the conflict analysis procedure analyzeConflict(). This proce- 
dure takes the reported conflict clause C and finds the root cause of the conflict. 
The analysis backtracks the trail, element by element, so long as C is a conflict 
clause, while resolving any trail propagations from C. Once done, the analysis 
returns the clause along with the flag that indicates whether this conflict clause 
C is empty (indicating the final conflict). If the conflict is not final, the proce- 
dure calls backtrackWith() to backtrack the trail further, if possible, and add 
a new assignment to the trail, ensuring progress and fixing the conflict. The 
main invariant of the conflict resolution procedure is that the conflict clause C 
is always implied by asserted formulas. 
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Modification 2 (Conflict Analysis). To support SMT modulo a model 
xr v, the analysis procedure analyzeConflict(M, C, x) stops as soon 
as it encounters a variable x; € x to resolve, and returns (C, true). 


This modification is based on the fact that the variables x; have a fixed value 
given by the model. Assume that conflict analysis attempts to undo a variable x; 
that is part of the provided model æ + v. This can only happen when the trail 
consists of only variables from æ and implications of asserted formulas. In other 
words, this particular conflict cannot be resolved unless we modify either the 
assertions themselves or the input model. The clause resulting from the analysis 
marked as final is our starting point for producing the model interpolant. 


Modification 3 (Final Analysis). To support SMT modulo a model x > 
v, the procedure analyzeFinal(M, C) resolves any remaining trail propa- 
gations in M from the clause C and returns the resulting clause I. 


The resolution of propagations in this final analysis is done in the same manner 
as in regular conflict analysis. This means that the resulting clause I is implied 
by the asserted formulas. In addition, resolving all propagations from the conflict 
clause ensures that all literals of J evaluate to false only because of the assignment 
x —> v, making J an appropriate model interpolant. 


Example 3. Consider two formulas F} = b and Fh = >b V (x? + y? < 2). 
When asserting these two formulas to the MCSAT solver, the Boolean and 
arithmetic plugins will identify the set of terms relevant for satisfiability as 
R = {b, x,y, (x? + y? < 2)}. Additionally, the assertions will be added to the 
trail and propagated’, resulting in the following initial trail 


Mo=[b~T, F ~ T, (2? +y <2) 3T]. 


We now apply our procedure to solve F} and Fy modulo the partial model 
{z= 2}. 

In the first iteration, no term in R is unit (with only one variable unassigned), 
and propagation does not infer any new facts or conflicts. The procedure thus per- 
form a decision on the unassigned variable x of the model, resulting in the trail 


M, = [b~ T, F ~ T (2? +y <2) BT, r2]. 


In the second iteration, as (x+y? < 2) is unit in the trail Mj, the arithmetic 
plugin examines the constraint and deduces that there is no potential solution 
for y. This constitutes a unit inconsistency that the plugin reports, along with 
the conflict clause? 


Co = (z? +y? < 2) V a(z > V2). 
5 Notation t & v denotes that t is assigned to v due to propagation, and F is the 


reason of the propagation. 
6 We use (x > v2) as a shorthand for the extended constraint £ >, root(x? — 2, 2, £). 
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Conflict analysis takes clause Co and starts the resolution process. As the 
top variable x on the trail Mj, is part of the input model, the analysis stops and 
reports that the clause Co is the final explanation. This clause is valid, but not 
yet a model interpolant as it contains a literal with variable y. We then proceed 
with the final analysis to remove such literals. First, we resolve (x? + y? < 2) 
from Co using its reason clause F}, which gives the clause C1 = =bV—(a > V2). 
Then, we resolve b from Cı with an empty reason (b is an assertion), resulting 
in the final clause and model interpolant I = 7(x > V2). 


4 Nonlinear Arithmetic 


The general approach to interpolation presented so far is not specific to nonlinear 
arithmetic. We now tackle two practical issues that arise in nonlinear arithmetic 
and we discuss the properties of our interpolation procedure in the context of 
nonlinear arithmetic. First, on nonlinear problems, as seen in Example 3, the 
interpolation procedure can return model interpolants that include extended 
polynomial constraints. This is an artifact of the underlying decision procedure 
(such as NLSAT [29]) that might use extended polynomial constraints to suc- 
cinctly represent conflict explanations. While such constraints make decision 
procedures more effective, they are undesirable for interpolation: interpolants 
should be described in the language of the input formulas, if possible. Second, 
to use the interpolant procedure in the context of model checking, we also need 
to devise a generalization procedure for polynomial constraints. 

This section uses concepts from cylindrical algebraic decomposition (CAD). 
We keep the presentation example-driven and focused on our particular needs, 
and refer the reader to the existing literature for further information [2,5,7]. 
Cylindrical algebraic decomposition is a general approach for reasoning about 
polynomials based on the following result due to Collins [10]. For any set of 
polynomials f1,..., fk € Zla1,...,2n] one can algorithmically decompose R” 
into connected regions (called cells) such that all the polynomials f; are sign- 
invariant in every cell C;. This means that the cells also maintain the truth value 
of any polynomial constraints over the polynomials f;, which is crucial in many 
reasoning techniques for polynomial constraints. 

The theory and practice of CAD is heavily dependent on the ordering of 
variables involved. For this paper we always assume the CAD order to be the 
same as the order of the defined polynomials (e.g., 71 < £2 < ... < £n). Every 
CAD cell is cylindrical in nature, and can be described by constraints where 
every dimension of the cell (called a level) can be completely defined by relying 
only on the previous dimensions. We illustrate this through an example. 


Example 4. Consider the polynomial f = x? + y? — 2 € Z[x,y]. A CAD of f is 
depicted in Fig. 1 (left). The cell C4 is defined by two constraints: 


C} = y >, root(z? + y’ — 2,2,y), 
OC? = >, root(z? —2,1,2) A £ <, root(a” — 2,2, £). 
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Fig. 1. CAD of the polynomial f = x? + y? — 2 from Example 4 (left). Computed cell 
capturing the model (1,2) of Example 5 (right). 


Constraint CF is at the first level (it’s a constraint on x only), while constraint 
CÏ is at the second level and relates variables x and y. The full cell description 
is then Cı = C7? A CY. The green cell Cy can be described by C} = T and 
Cf = x >, root(x? — 2,2, x), with the full description Cz = CY A C2. 


Model-based decision procedures such as NLSAT rely on CAD construction 
but do not construct the complete CAD decomposition. Instead, given a point 
in R” they can construct a single cell of a CAD in a model-driven fashion. For 
more information about this approach, we refer the reader to [4,27]. For our 
purposes we abstract the cell construction, and denote with describeCell(F, M) 
the function that, given a set of polynomials F, returns a description of a CAD 
cell of F that contains the model M. 

Following the terminology used in CAD, we say that a non-empty connected 
subset of R* is a region. A set of polynomials {f;,... fs} C Zly,a], with y = 
(Y1, --- Yn), is said to be delineable in a region S C R” if for every f; (and f;) 
from the set, the following properties are invariant for any a € S: 


1. the total number of complex roots of fila, x); 
2. the number of distinct complex roots of fila, x); 
3. the number of common complex roots of fi(œ, x) and f;(a, x). 


Delineability has important consequences on the number and arrangement of 
real roots of polynomials f;. As explained by the following theorem, if a set of 
polynomials F is delineable on a region S, then the number of real roots of the 
polynomials does not change on S. Moreover, these roots maintain their relative 
order on the whole of S. 
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Theorem 1 (Corollary 8.6.5 of [40]). Let F be a set of polynomials in 
Z\y, x], delineable in a region S C R”. Then, the real roots of F vary con- 
tinuously over S, while maintaining their order. 


For a polynomial f € Zia] and model M = {a +» v}, we denote with 
sgncstr( f, M) the polynomial constraint that matches the sign of f in M, i.e. 


f <0 ifsgn(f(v)) <0 
sgncstr(f, M) = 4 f>0 if sgn(f(v)) >0 
f=0  ifsgn(f(v)) =0 


As described above, a CAD cell can be succinctly described by relying on 
extended polynomial constraints. We now show that the description of the cell 
can be reduced to basic polynomial constraints. 


Lemma 3. Let fi € Z[y1,..-,Yn, £] be two polynomials of degrees m;, and F; = 
x Vr root(fi,ki,x) be extended polynomial constraints of a cell description. Let 
S be a region of R” where { fi, fo} are delineable and let M = {y > v,xz a} 
be a model such that v € S. Then, for ally € S it holds that 


mı—l1 mə—1 
\ sencstr(f.”, M) A \ sencstr( f$? , M) > AAR. 
i=0 i=0 


The proof of this lemma is relatively straightforward. The CAD cell descrip- 
tion for level x represents an entry in the sign table of fı and fə (with no roots 
in between). A part of this sign table entry that contains M can be described 
with the signs of all the derivatives of fı and fz as long as we can guarantee that 
neither the arrangement nor the number of roots fı and fọ change. But, this is 
guaranteed by fı and f2 being delineable on S, so the lemma holds. 

As a corollary to this lemma, in the context of CAD cell construction around 
a model M, we can replace any extended constraints describing a cell C with 
basic constraints stating that the signs of the polynomial derivatives are the same 
as in M. This results in a valid CAD subcell C’ C C for the same polynomials, 
that still contains the model M. We denote the function that constructs a basic 
CAD cell description of a set of polynomials F capturing the model M with 
describeCellBasic(F’, M). 


Example 5. Based on Example 4, we can construct a cell around the model M = 
{x 1,y 2}. Function describeCellBasic(F, M) will return the constraints 


C} = (x? +4? > 2) A (y > 0), 
Cf = (a? < 2) A (x > 0). 
The full cell description is then C3 = Cf AC}. Note that this cell is smaller than 


the cell C1 from Example 4. This reduction in size is generally undesirable, but 
it is a price to pay for having the description in a simpler language. 
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Interpolation Without Extended Constraints. We now show how the cell con- 
struction described above can be used to remove extended polynomial constraints 
from a model interpolant. Assume a clausal model interpolant 


I=(LV...V LV... V Ey) 


that is implied by formula A and refutes a model M = {x + v}, i.e., all literals 
of I evaluate to L in M. Assume also that some literal L; contains an extended 
polynomial constraint £n Vr root(f, k, £n), with f € Z[a]. We aim to replace the 
extended literal L; with literals over basic polynomial constraints. To do so, we 
need to find literals L},..., L” such that Li > (Li V...V L™) and all literals 
LÍ evaluate to L in M. Then, the clause 


T = (Li v...v Lv... LP V...V Ly) 


will also be a model interpolant implied by A that refutes the model M. 

We can construct the literals L? using single cell construction as follows. We 
create a description of the CAD cell of the polynomial f from L; that captures 
the model M. Let describeCellBasic({f}, M) = Di A...A\ Dm be this description. 
Since the cell fully captures the behavior of f around M, we know that Dı A 
... A Dm > 7L; and all literals Dj evaluate to T. Therefore, we can use the cell 
description to eliminate the extended literal L;, obtaining the clause 


P=(1vV...VaDiV...5Dm V ...V En) 


By continuing this process, we can replace all extended literals from a model 
interpolant, to obtain a model interpolant in the basic language of polynomial 
constraints. 


Example 6. Consider the model interpolant T = 7(x >, root(x? — 2,2, x) from 
Example 3 that refutes the model M = {x +> 2}. To express J in terms of basic 
polynomials constraints we first construct a regular CAD cell of f = z? — 2 
around M. In this case this cell is simply £ >, root(x? — 2,2,x). Then, we use 
Lemma 3 to construct a basic CAD cell description as (x? > 2) A(x > 0). Finally, 
the simplified interpolant is I’ = =(x? > 2) V 7(a > 0). 


Termination. With the description of the interpolation procedure complete, we 
discuss the termination of the procedure. To do so, we fix the formula A(æ, y) 
of Definition 3 and we assume a fixed order of variables that ensures y; < £i. 
Since the MCSAT decision procedure on which we rely is based on CAD, we can 
put a bound on the set of literals that can ever appear in a model interpolant 
from the formula A to an arbitrary model M. Let P4 be the set of polynomials 
appearing in A, and let P = P(P,) denote the closure of the set P4 under the 
CAD projection operator used by the decision procedure. Finally, let P’ be the 
closure of P under derivatives. The set of polynomial constraints that can appear 
in the interpolant J is limited to basic polynomial constraints over polynomials 
in P’. This means that the procedure MCSAT::CHECK() can only generate a finite 
number of model interpolants and therefore has the finite convergence property. 
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Lemma 4. Assuming a fixed variable order, the MCSAT::CHECK() procedure has 
the finite convergence property for nonlinear arithmetic formulas. 


Together with Lemma 2, this lemma implies that our interpolation procedure 
for the theory of nonlinear arithmetic terminates. 


Model Generalization. We now proceed to show how the CAD cell construction 
can be used in a natural way to provide model-driven generalization. As in 
Definition 2, assume a formula F (æ, y) such that F is true in a model M. Our aim 
is to construct a formula G(x) that generalizes the model M and still guarantees 
a solution to F. 

Following the approach of [15], we do this in two steps. First, we construct 
an implicant B of F based on the model M. Then, we eliminate the variables 
y from B, again relying on the model M. The implicant B is a conjunction 
of literals that implies F and such that B is true in M. The implicant can be 
computed by a top-down traversal of the formula F while using the model M 
to evaluate the formula nodes (see, e.g., [15] for a detailed description). To find 
a formula G such that G > dy . B, we use CAD cell construction as follows. 
Let P C Zla,y] be the set of all polynomials appearing in B, and let the cell 
description of P around M be 


describeCellBasic(P, M) = Dz A Dy. 


Here, D, denotes the description of cell levels of variables x, while Dy denotes 
the description of cell levels of variables y. Because of the cylindrical nature of 
CAD cells, and the order on variables y; and x;, we are guaranteed that every 
solution of Dz can be extended to a solution of Dy. Therefore we set the final 
generalization G(x) = Dz. 


Example 7 (Generalization). Consider the formula F = (x? + y? < 2) and the 
model M = {x => 1,y + 2} that satisfies F, and let us compute a generalization 
G(x) of M. First, we compute a CAD cell of f = x? + y? — 2 as shown in 
Example 5. Then we drop the description of cell level y, to obtain the model 
generalization G(x) = (x? < 2) A (x > 0). 


5 Evaluation 


To the best of our knowledge, there is no clear metric for evaluating how good 
an interpolant is, or for comparing different interpolants. In this section, we first 
show two examples to illustrate the procedure and its applications. Then, we 
evaluate the effectiveness of our interpolation procedure on practical problems 
that arise from model-checking applications. To this end, we integrate the pro- 
cedure into a model checker and evaluate whether the procedure is efficient, and 
can produce abstractions that help the model checker synthesize invariants and 
discover counter-examples. 
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We have implemented the reasoning procedures (solving modulo partial mod- 
els and interpolation procedure) by extending the existing MCSAT implementa- 
tion of the YICES2 SMT solver [14]. We used the LIBPOLY library [31] for comput- 
ing the model generalization and simplification of algebraic cells. Since YICES2 
is integrated into the SALLY model checker [30], we rely on the PDKIND method 
[30] as the model checking engine (the user of interpolation) in our evaluation. 


-4 -2 0 2 4 -4 -2 0 2 4 


Fig. 2. Illustration of interpolants from Example 8. In blue and orange are the feasible 
space of the formulas A and B (projected on x and y). In green is the feasible space 
of the interpolant produced by our method (on the left) and the interpolant produced 
by [19] (on the right). (Color figure online) 


Example 8. We compare the style of interpolants generated by our new proce- 
dure with the ones generated by numerical approaches such as [19]. Example 4 
from [19] considers two formulas of the form 


A(z, y, 41, @2,b1,b2) = (f1 > OA fo > 0) V (fg [OA fa > 0), 
B(x, y, C1, c2, d1, d2) = (g1 > OA g2 > 0) V (g3 > OA ga = 0). 


The polynomials f; and g; involved in A and B are of degree 2. The right-hand 
side of Fig. 2 shows the interpolant J, found by the approach in [19]. This inter- 
polant is of the form h(x, y) > 0, where h is a polynomial degree two computed 
using semidefinite programming. Our approach, on the other hand, produces the 
interpolant Jz shown on the left-hand side of Fig. 2. This interpolant consists of 
12 clauses, each containing 6-8 polynomial constraints over 16 different polyno- 
mials (8 linear, 8 of degree 2). The interpolant Jy is ultimately produced from 
fragments of a CAD so its edges touch upon the critical points of the shape they 
were produce from (formula A). Interpolant J4, on the other hand, has a simple 
form dictated by the method [19]. Which form is ultimately more useful depends 
on a particular application. 
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1C3-NRA KIND PDKIND 

problem set solved valid/invalid time (s) solved  valid/invalid time (s) solved valid/invalid time (s) 
handcrafted (14) | 10 9/1 381 3 2/1 0| 14 13/1 4 
hycomp (7) ED Oa ac O 
hyst (65) 39 32/7 404| 25 13/12 50] 38 26/12 42 
isat3 (1) o o0 ol o o0 ol o o0 o 
isat3-cfg (10) 8 6/2 14| 9 63 9ļ]10 7⁄3 8 
nuxmv (2) 2 2/0 158 0 0/0 0 1 1/0 1118 
sas13 (13) 10 5/5 13| 5 0/5 o| 138 8/5 7 
tem (2) z A aa a wo e 2/0 0 

73 58/15 986| 48 24/24 855 | 82 59/23 1971 


Fig. 3. Evaluation Results. For each tool, we report the number of solved problems, 
how many of the solved problems were valid and invalid, and the total time used to 
solve them. The rows correspond to different problem classes, and the bottom row 
reports the overall results for all 114 benchmarks. 


Example 9 (Cauchy-Schwartz). As described in Example 1, we can model the 
computation of Cauchy-Schwarz inequality as a transition system G,,. Then we 
can prove the inequality correct if we can prove that the property Pes is valid in 
Ges. The PDKIND model checking engine with the new interpolation procedure 
proves the property valid in 1s. 


Benchmarks. We run the evaluation on an existing set of nonlinear model- 
checking problems used by Cimatti, et al. [8]. This set consists of 114 bench- 
marks from various sources: handcrafted benchmarks, hybrid system verification, 
NUXMV benchmarks, C floating-point verification, and verification of Simulink 
models. The benchmark problems all contain transition systems with nonlinear 
behavior. For each problem, the goal is to prove or disprove a single invariant. 
We refer the reader to [8] for a more detailed description. 


Evaluation. Cimatti, et al. [8] present an abstraction approach based on incre- 
mentally more precise linear approximations of nonlinear polynomials. They 
show that this approach, implemented in the IC3-NRA tool, is superior to other 
tools (such as, ISAT3 [36] and NUXMvV [6] with upfront linear abstraction). Since 
our goal is to show the effectiveness of our interpolation procedure, rather than 
compare to many model checking engines, we keep the evaluation simple and 
only compare to IC3-NRA. In addition, we include the k-induction engine KIND 
of SALLY in the comparison to illustrate the importance of invariant inference 
and counter-example generation.’ 

We ran the tools on the benchmark set with a 1h CPU timeout per problem. 
The results are shown in Fig.3 and on the cactus plot in Fig. 4. A scatter plot 
comparison of PDKIND against IC3-NRA and KIND is shown in Fig. 5. 


T KIND performs k-induction checks for increasing values of k and stops if either the 
property is shown k-inductive, or a counter-example is found. 
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Fig. 4. Cactus Plots Comparing the Performance of IC3-NRA, KIND, and PDKIND. The 
x axis is the number of problems solved (valid on the left, invalid on the right) and the 
y axis is the time needed to solve the problem (log scale). 
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ic3-nra kind 
Fig. 5. Scatter Plots Comparing the Performance of IC3-NRA and KIND with PDKIND. 
Green squares represent problems that are valid. Red dots represent problems that are 
invalid. Each axis represents the time it took the tool to solve the problem (log scale). 
(Color figure online) 


As can be seen from Fig. 3, the results are positive. The PDKIND engine with 
the new interpolation method can prove more properties and find more counter- 
examples than the state-of-the-art IC3-NRA. 

Out of 59 properties that PDKIND shows correct, 36 cannot be proved by 
KIND. This means that these properties are likely not k-inductive and that the 
interpolants produced by our procedure are valuable abstractions in invariant 
inference. Similarly, IC3-NRA proves 37 properties that are not k-inductive. As 
can be seen from the scatter plot in Fig. 5, there are properties that PDKIND can 
prove than IC3-NRA cannot, and vice versa (11 and 10, respectively). This is to 


284 D. Jovanovié and B. Dutertre 


be expected from a difficult domain, but it also means that the interpolation and 
the abstraction approach (or other methods) can be used to complement each 
other. 

As for the invalid properties, since our interpolation method (and thus 
PDKIND) is based on complete and precise reasoning, while IC3-NRA relies on 
abstraction, it is to be expected that PDKIND can prove more properties invalid. 
Furthermore, the comparison with KIND in Fig. 5 shows that PDKIND finds all but 
one counter-examples that KIND does in a similar amount of time. We see this as 
a confirmation that the interpolation and generalization methods are effective, 
i.e., they do not impede the search for counter-examples. 


5.1 Related Work 


There is ample literature on interpolation for different fragments of nonlin- 
ear arithmetic. Existing methods can roughly be classified into two categories: 
approaches based on interval reasoning, and approaches based on semidefi- 
nite programming. Interval reasoning techniques (e.g., [20,35,36]) construct a 
proof of unsatisfiability through interval slicing and propagation. From such 
a proof, interpolants can be built using proof-based interpolation techniques. 
While incomplete, interval-based techniques can be very effective on problems 
that are hard for complete techniques. Moreover they can support more polyno- 
mial functions (e.g., elementary functions, ODEs). Our procedure is complete, 
but it is limited to the theories supported by MCSAT. The approaches based on 
semidefinite programming [12,18,19] generally approach the interpolation prob- 
lem by restricting both the fragment of arithmetic (e.g., bounded constraints, 
same set of variables, quadratic constraints) and the shape of the interpolant (a 
single polynomial constraint) so that the interpolant itself can be represented 
as a semidefinite optimization problem. When they apply, these procedures are 
also very effective but they suffer from numerical imprecision, requiring special 
care to account for these errors and making them difficult to use in formal ver- 
ification. In contrast, out procedure applies to nonlinear arithmetic as a whole. 
It relies on symbolic techniques, which are not subject to numerical errors. It is 
precise and complete, and it produces clausal interpolants. 

The core ideas beyond our model-based interpolation approach were pre- 
sented at the Boolean level as SAT solving with assumptions [17]. Closest to 
our work is the work of Schindler and Jovanović [42] where a similar model- 
based approach to interpolation is applied to conjunctions of linear arithmetic 
constraints based on conflict resolution. Our work is more general as it applies 
to formulas other than conjunctions, and it is applicable to a wider range of 
theories. 


6 Conclusion and Future Work 


We have presented a general approach for interpolation in SMT. This novel app- 
roach relies on a mode of interaction with the SMT solver that can check a 
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formula for satisfiability modulo a partial model and, if the formula is unsat- 
isfiable, can return a model interpolant that refutes the model. This allows us 
to develop a first complete interpolation procedure for nonlinear arithmetic. We 
have implemented the new procedure in the YICES2 SMT solver and evaluated 
the interpolation procedure on model-checking problems. The new procedure 
seems to be effective in practice and opens new possibilities in the verification 
of systems that contain nonlinear behavior. Additionally, we show interesting 
examples of how the procedure can be used in automating induction proofs in 
mathematics. 

The interpolation procedure that we presented can support other theories 
available in MCSAT (e.g., uninterpreted functions [28], bit-vectors [22], nonlinear 
integer arithmetic [27]). We plan to explore interpolation in these theories in 
more detail, and in the contexts where interpolation can be beneficial (e.g., 
model checking, quantified reasoning, termination, and proof generation). 
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Abstract. We present a novel length-aware solving algorithm for the 
quantifier-free first-order theory over regex membership predicate and lin- 
ear arithmetic over string length. We implement and evaluate this algo- 
rithm and related heuristics in the Z3 theorem prover. A crucial insight 
that underpins our algorithm is that real-world regex and string formu- 
las contain a wealth of information about upper and lower bounds on 
lengths of strings, and such information can be used very effectively to 
simplify operations on automata representing regular expressions. Addi- 
tionally, we present a number of novel general heuristics, such as the pre- 
fix/suffix method, that can be used to make a variety of regex solving 
algorithms more efficient in practice. We showcase the power of our algo- 
rithm and heuristics via an extensive empirical evaluation over a large 
and diverse benchmark of 57256 regex-heavy instances, almost 75% of 
which are derived from industrial applications or contributed by other 
solver developers. Our solver outperforms five other state-of-the-art string 
solvers, namely, CVC4, OSTRICH, Z3seq, Z3str3, and Z3-Trau, over this 
benchmark, in particular achieving a speedup of 2.4x over CVC4, 4.4x« 
over Z3seq, 6.4 over Z3-Trau, 9.1 over Z3str3, and 13x over OSTRICH. 


Keywords: String solvers + SMT solvers - Regular expressions 


1 Introduction 


Satisfiability Modulo Theories (SMT) solvers that support theories over regular 
expression (regex) membership predicate and linear arithmetic over length of 
strings, such as CVC4 [25], Z3str3 [8], Norn [3], S3P [39], and HAMPI [22], have 
enabled many important applications in the context of analysis of string-intensive 
programs. Examples include symbolic execution and path analysis [11,32], as well 
as security analyzers that make use of string and regex constraints for input san- 
itization and validation [5,33,35]. Regular expression libraries in programming 
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languages provide very intuitive and popular ways for developers to express 
input validation, sanitization, or pattern matching constraints. Common to all 
these program analysis applications is the requirement for a rich quantifier-free 
(QF) first-order theory over strings, regexes, and integer arithmetic over string 
length. Unfortunately, the QF first-order theory of strings containing regex con- 
straints, linear integer arithmetic over string length, string-number conversion, 
and string concatenation (but no string equations!) is undecidable [7,9]. In a 
previous paper [19] we showed that a related QF first-order theory over word 
equations, linear integer arithmetic over string length, and string-number con- 
version predicate, but without regular expressions is also undecidable. It can also 
be shown that many non-trivial fragments of this theory are hard to decide (e.g., 
they have exponential-space lower bounds or are PSPACE-complete). Therefore, 
the task of creating efficient solvers to handle practical string constraints that 
belong to fragments of this theory remains a very difficult challenge. 

Many modern solvers typically handle regex constraints via an automata- 
based approach [4]. Automata-based methods are powerful and intuitive, but 
solvers must handle two key practical challenges in this setting. The first chal- 
lenge is that many automata operations, such as intersection, are computa- 
tionally expensive, yet handling these operations is required in order to solve 
constraints that are relevant to real-world applications. The second challenge 
relates to the integration of length information with regex constraints. Length 
constraints derived from automata may imply a disjunction of linear constraints, 
which is often more challenging for solvers to handle than a conjunction. 

As we demonstrate in this paper, the challenges of using automata-based 
methods can be addressed via prudent use of lazy extraction of implied length 
constraints and lazy regex heuristics in order to avoid performing expensive 
automata operations when possible. Inspired by this observation, we introduce 
a length-aware automata-based algorithm, Z3str3RE (and its implementation 
as part of the Z3 theorem prover [18]), for solving regex constraints and linear 
integer arithmetic over length of string terms. Z3str3RE takes advantage of the 
compactness of automata in representing regular expressions, while at the same 
time mitigating the effects of expensive automata operations such as intersection 
by leveraging length information and lazy heuristics. 


Contributions: We make the following contributions in this paper. 


Z3str3RE: An SMT Solver for Regular Expressions and Linear Integer 
Arithmetic over String Length. In Sect.3, we present a novel decision pro- 
cedure for the QF first-order theory over regex membership predicate and lin- 
ear integer arithmetic over string length. We also describe its implementation, 
Z3str3RE, as part of the Z3 theorem prover [8, 18]. The basic idea of our algorithm 
is that formulas obtained from practical applications have many implicit and 
explicit length constraints that can be used to reason efficiently about automata 
representing regexes. In Sect. 4 we present four heuristics that aid in solving reg- 
ular expression constraints and that can be leveraged in general settings. Specif- 
ically, we present a heuristic to derive explicit length information directly from 


‘ 


1 We use the terms “word” and “string” interchangeably in this paper. 
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regexes, a heuristic to perform expensive automata operations lazily, a heuristic 
to refine lower and upper bounds on lengths of string terms with respect to regex 
constraints, and a prefix/suffix over-approximation heuristic to find empty inter- 
sections without constructing automata. All heuristics are designed to guide the 
search and avoid expensive automata operations whenever possible. Our solver, 
Z3str3RE, handles the above theory as well as extensions (e.g. word equations and 
substring function) via the existing support in Z3str3. We focus on the core algo- 
rithm as it is the centerpiece of our regex solver. We also carefully distinguish the 
novelty of our method from previous work. 


Empirical Evaluation and Comparison of Z3str3RE? Against CVC4, 
OSTRICH, Z3seq, Z3str3, and Z3-Trau: To validate the practical efficacy 
of our algorithm, we present a thorough and extensive evaluation of Z3str3RE in 
Sect. 5, where we compare it against CVC4 [24], OSTRICH [15], Z3’s sequence 
solver [18], Z3str3 [42], and Z3-Trau [1] on 57256 instances across four regex- 
heavy benchmarks with connections to industrial security applications, includ- 
ing instances from Amazon Web Services and AutomatArk [16]. Z3str3RE sig- 
nificantly outperforms other state-of-the-art tools on the benchmarks consid- 
ered, having more correctly solved instances in total, lower running time, and 
fewer combined timeouts/unknowns than other tools, and no soundness errors 
or crashes. We note that almost 75% of the benchmarks were obtained from 
industrial applications or other solver developers. Over all the benchmarks, we 
demonstrate a speedup of 2.4x over CVC4, 4.4x over Z3seq, 6.4x over Z3-Trau, 
9.1x over Z3str3, and 13x over OSTRICH. 


2 Preliminaries 


This section contains some basic definitions as well as a brief overview of the 
theoretical results which shape the landscape in which we state our contribution. 


2.1 Basic Definitions 


We first describe the syntax and semantics of the input language supported by 
our solver Z3str3RE (Algorithm 1). 


Syntax: The core algorithm we present in Sect.3 accepts formulas of the 
quantifier-free many-sorted first-order theory of regex membership predicates 
over strings and linear integer arithmetic over string length function. The syn- 
tax of this theory is shown in Fig. 1. 

We denote the set of all string variables and all integer variables as Varstr and 
Varint respectively, and the set of all string constants and all integer constants 
as Constr and Coning respectively. String constants are any sequence of zero or 
more characters over a finite alphabet (e.g., ASCII). 

Atomic formulas are regular expression membership constraints and linear 
integer (in)equalities. Regex terms are denoted recursively over regex concate- 
nation, union, Kleene star, and complement, and for a string constant w, the 


? A reproduction package is available at https://figshare.com/s/5ae73a6f3c55f5c5e4cl. 
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F n= Atom | FAF | FVF | AF 
Atom ::= tstr E€ RE | Aint 
Aint n= tint = Vint | tint < tint 


RE := “w” | RE. RE | REURE | RE* | RE, with w € Constr 
tint “=m |v | len(tstr) | tint + tint | M - tint, with m E Conint, v E€ Varint 
tstr == s, with s E€ Vargtr U Constr 


Fig. 1. Syntax of the input language accepted by Algorithm 1. Z3str3RE accepts an 
extension of this syntax supporting word equations and other string terms. 


regex term “w” represents the regular language containing w only. All regex 
terms must be grounded (i.e. cannot contain variables). Linear integer arith- 
metic terms include integer constants and variables, addition, and string length. 
Multiplication by a constant is expanded to repeated addition. String terms are 
either string variables or string constants. The length of a string S is denoted 
by len(S), the number of characters in S. The empty string has length 0. 

Our implementation Z3str3RE supports the theory in Fig. 1 extended with 
more expressive functions and predicates, including word equations (equality 
between arbitrary string terms) and functions such as indexof and substr that 
are needed for program analysis. Z3str3RE handles these terms via existing 
support in Z3str3. We focus on the above input language in the presentation of 
our algorithm in this paper and theoretical content. 


Semantics: We refer the reader to [42] for a detailed description of the semantics 
of standard terms in this theory. We focus here on the semantics of terms which 
are less commonly known. The regex membership predicate S' € R, where S'isa 
string term and R is a regex term, is defined by structural recursion as follows: 


Se “w” iff S = w (where wis a string constant) 
S € R,- Rə iff there exist strings Si, Sə with S = Sj - Sd, S1 € Ri, So € Ro 
S € Rı U Rə iff either S € Rj or S € Rə 


SER iff either S = eor there exists a positive integer n such that 
S = S1- So... Sn and S; € Rfor eachi=1...n 
SER iff S ¢ R (that is, S € Ris false) 


2.2 Theoretical Landscape 


To put our contributions in context, we briefly discuss a series of (un)decidability 
and complexity results developed around the fragments and extensions of the 
theory supported by Z3str3RE. 

In particular, we consider extensions which may have a string-number con- 
version predicate numstr? and/or string concatenation. Both extensions are 


3 We introduce numstr, which is not part of the SMT-LIB standard, in order to sim- 
plify presentation of the theoretical results. The predicate is no more expressive than 
the standard operators str.to_int/str.from_int, except that those terms handle 
decimal inputs. The results easily extend to other (finite) alphabets including deci- 
mal/hexadecimal digits with appropriate case analysis. 
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important to real-world program analysis. The predicate numstr has the syntax 
numstr (tint, tstr) and the following semantics: numstr(n,s) is true for a given 
integer n and string s iff s is a valid binary representation of the number n (pos- 
sibly with leading zeros) and n is a non-negative integer. That is, s only contains 
the characters 0 and 1, and yer s'[ij2ien(s)-*-1 = n, where s’[i] is 0 if the 
ith character in s is ‘0’ and 1 if that character is ‘1’. String concatenation has 
the syntax tstr ::= tstr ` tstr and the usual semantics defined by SMT-LIB [10]. 
In the following, TERE,n,c is the quantifier-free many-sorted first-order theory 
of linear integer arithmetic over string length function (L), regex (RE) mem- 
bership predicates, string-number conversion (n), and string concatenation (c) +. 
The following quantifier-free fragments of Tyre n,< are of interest: TLRE,c, TLRE, 
TRE, n.c, TRE,n, and Tre. The fragment Tyre, (respectively, Tz Rg) has all func- 
tions and predicates of Tyre». except the string-number conversion predicate 
(and, respectively, except the string concatenation function). The theory TRE,n,c 
(respectively, TRg,n and Trp) has all functions and predicates of Tr RE,n,¢ except 
the length function (and, respectively, the string concatenation function, and, 
in the case of Trg, the string-number conversion predicate). Note that while 
all these theories allow equalities between terms of sort Int, they do not allow 
equalities between terms of sort Str and cannot express general word equations. 
The theoretical landscape is laid out as follows. Firstly, following the results 
and techniques introduced in [3], we obtain that Tz rz,- and, in particular, TERE 
is decidable. A procedure deciding a formula from TERE, would first construct 
for each variable (string or integer), based on the regular expression constraints 
and length constraints which involve it, a finite automaton, then reduce the 
problem of checking the satisfiability of the formula to checking whether the 
constructed automata accept at least one string. A similar approach shows that 
Tren is decidable. We observe that the presence of complements in regular 
expressions is an inherent source of complexity for these procedures. Indeed, we 
can easily encode the universality problem for regular expressions as a formula 
in the theory Trg. Moreover, given a regex R of length n over an alphabet X, 
deciding whether L(R) = X* is equivalent to deciding the satisfiability of the 
formula y of Terr consisting of the atoms x € R and x € X*. Accordingly, by 
the results from [37], if the choice for R is restricted to regular expressions with 
at least k stacked complements, then there exists a positive rational number c 


y EN 
such that the considered problems are not contained in NSPACE 92? 


k—1times 

In other words, the depth of the stack of complements of the formula trans- 
lates to the height of the tower of exponents in the complexity of deciding that 
formula y. On the other hand, if we only consider regular expressions without 
stacked complements, then the decision problems for the considered theories are 
PSPACE-complete. Indeed, the automata-based approach described above can 
be implemented to work in nondeterministic polynomial space; strongly related 
complexity results are obtained in [26,27]. 


* Note that the fragments considered here do not include word equations. 
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Algorithm 1: Z3str3RE’s length-aware algorithm for the theory Tire of 
regex and integer constraints 


Input : Conjunction ¢ of constraints of the form S € RE, and conjunction w of linear 
integer arithmetic constraints over string lengths 
Output : SAT or UNSAT 
forall constraints S E€ RE in ¢ do 
Ls <— ComputeLengthAbstraction(S) ; 
Lre + ComputeLengthAbstraction(RE) ; 
if y U Ls U Lre inconsistent then 
| return UNSAT 
end 
refine Ls as tightly as possible with respect to LRE; 
end 
forall strings S; occurring in ¢ do 
let R be the set of all regexes RE in all terms S; € RE ; 
Automaton I + intersection of all automata corresponding to regexes in R ; 
if I is empty then 
| return UNSAT 
else 
| Lr — ComputeLengthAbstraction(J) ; 
end 


OMNOA RANE 


BRE RRR BR 
Quah OWONRrRO 


end 
Ls + the union of all length abstractions Lg; 
Lre + the union of all length abstractions LRE; 
Lr + the union of all length abstractions Ly; 
if YU Ls U Lre U Lr has any solution M then 
forall strings S occurring in ¢ do 
obtain len(S) from M ; 
let A be the set of all automata for all regexes RE in all terms S € RE ; 
Automaton J + intersection of all terms in A ; 
S <— any string of length len(S) in J ; 
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end 
return SAT 


N N 
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else 
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return UNSAT 


w 
° 


end 


w 
H 


At the opposite end of the spectrum is the theory TLERE,n,c, Which is undecid- 
able. Indeed, one can show that the more specific theory Tre ,n,c (i.e. disallowing 
arithmetic over length) has equivalent expressive power to the theory of word 
equations with regular constraints, a predicate allowing the comparison of the 
length of string terms, and the numstr predicate. Therefore, using the techniques 
from [17], one can show that the theory TLRE,n,c, in which we additionally allow 
arithmetic over length, is undecidable [7]. 


3 Length-Aware Regular Expression Algorithm 


This section outlines the high-level algorithm used by Z3str3RE to solve the sat- 
isfiability problem for Tr Rg, and its extension based on length-aware heuristics. 
3.1 High-Level Algorithm 


The pseudocode presented in Algorithm 1 captures the essence of Z3str3RE 
regex solver. Implementation-specific details are omitted for clarity. Z3str3RE 


An SMT Solver for Regexes and String Length 295 


incorporates a version of this algorithm as part of a DPLL(T)-style interaction 
with a core solver for Boolean combinations of atoms and other theory solvers 
able to handle arithmetic constraints and other terms. The tool handles string 
concatenation, string equality, and other string terms and predicates besides 
regex membership and string length via existing support in Z3str3, and leverages 
Z3’s integer arithmetic solver for arithmetic reasoning and model construction. 
This high-level presentation is expanded in Sect.4, where we describe several 
heuristics used in our implementation as part of the Z3str3RE tool. 

The algorithm takes as input a conjunction ¢ of regex membership constraints 
and a conjunction ~ of linear integer arithmetic constraints over the lengths of 
string variables appearing in ¢. Without loss of generality, it is assumed that 
all constraints in ¢ are positive; negative constraints S ¢ RE can be replaced 
with the positive complement S € RE. The algorithm returns SAT iff there 
is a satisfying assignment to all string variables consistent with the regex con- 
straints ¢ and length constraints w. It is assumed that the algorithm has access 
to a decision procedure for checking the consistency of linear integer arithmetic 
constraints and for obtaining satisfying assignments to these constraints (in our 
implementation, this is fulfilled by Z3’s arithmetic solver). 

Lines 1-8 check whether the length information implied by ¢ is consistent 
with ~. The function ComputeLengthAbstraction takes as input either a string 
term S or aregex RE and computes a system of length constraints corresponding 
to derived length information from string constraints or possible lengths of words 
accepted by the regex RE. This abstraction is exact, not an over-approximation. 
For example, given the regex (abc)* as input, ComputeLengthAbstraction would 
construct the length abstraction S € (abc)* — len(S) = 3n,n > 0 for a fresh 
integer variable n. If the length abstractions are inconsistent with the given 
length constraints, there can be no solution which satisfies both the length and 
regex constraints, and hence the algorithm returns UNSAT. Otherwise, line 7 
refines the length abstraction Lg with respect to the regex RE. This improves 
the efficiency of finding solutions to the augmented system of length constraints 
later in the algorithm. In our implementation, the lower and upper bounds of the 
length of S are checked against the lengths of accepting paths in the automaton 
for RE. For instance, if Ls implies that len(S) > 5, but the shortest accepting 
path in the automaton has length 7, the lower bound is refined to len(S) > 7. 

Lines 9-17 check that the intersection of all automata constraining each string 
variable is non-empty. Although intersecting automata is relatively expensive (as 
it runs in quadratic time w.r.t. the size of the intersected automata), it is still 
more efficient to do this before enumerating length assignments, and taking the 
intersection here is necessary to maintain soundness. (The heuristics in Sect. 4 
illustrate some methods by which this computation can be made more efficient 
or even avoided.) If the length information is consistent, the algorithm adds a 
length abstraction constraint Ly encoding the lengths of all possible solutions to 
the intersection J. 

By construction of Y% U Ls U Lre U Lg, the input formula is satisfiable iff 
this system of integer constraints has a solution. If such a solution M exists, 
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lines 22-28 construct an assignment for each string variable with respect to its 
length assignment. A solution must exist as the lengths of strings considered are 
limited to those lengths for which the intersection of the corresponding automata 
is non-empty; the solution is consistent by construction with both the input 
length constraints and string constraints. If a solution M does not exist, then the 
constraints ¢ A w are not jointly satisfiable, and the algorithm returns UNSAT. 

We demonstrate soundness, completeness, and termination of Algorithm 1 as 
follows. On line 4 we check whether YU Ls U Lpg is satisfiable. If not, we return 
UNSAT on line 5. Lines 9-17 check whether the intersection of regex constraints 
for each string variable is empty. If so, we return UNSAT; otherwise, we add 
an additional constraint encoding the lengths of all strings in this intersection. 
Therefore, Y U Ls U Lre UL, has a solution iff there exists an assignment to 
each string variable that is consistent with the arithmetic constraints ~ and that 
corresponds to the length of a solution in the intersection of its regex constraints 
Lr. Lines 22-28 construct this solution if it exists. Therefore, Algorithm 1 is 
a decision procedure for the QF first-order theory of regex constraints, string 
length, and linear integer arithmetic. 

As previously mentioned, Z3str3RE supports other high-level operations that 
are not part of this theory via existing support in Z3str3. An extension to this 
algorithm provides support for including these operations, which may render the 
theory undecidable. These terms are not in Algorithm 1 because their inclusion 
would make the algorithm incomplete (see Sect. 2.2). Algorithm 1 describes the 
part of the implementation which is novel and complete. 


4 Length-Aware and Prefix/Suffix Heuristics in Z3str3RE 


In this section, we describe the length-aware heuristics that are used in Z3str3RE 
to improve the efficiency of regular expression reasoning. We present an empirical 
evaluation of the power of these heuristics in Sect. 5.6. 


4.1 Computing Length Information from Regexes 


The first length-aware heuristic is used when constructing the length abstrac- 
tion on line 3. If the regex can be easily converted to a system of equa- 
tions describing the lengths of all possible solutions (for instance, in the 
case when it does not contain any complements or intersections), this sys- 
tem can be returned as the abstraction without constructing the automaton 
for RE yet. As previously illustrated, for example, given the regex (abc)* 
as input, ComputeLengthAbstraction would construct the length abstraction 
S € (abc)* — len(S) = 3n,n > 0 for a fresh integer variable n. Note that this 
can be done from the syntax of the regex without converting it to an automaton. 
Deriving length information from the automaton would be simple by, for exam- 
ple, constructing a corresponding unary automaton and converting to Chrobak 
normal form. However, performing automata construction lazily means we can- 
not rely on having an automaton in all cases; this technique also provides length 
information even when constructing an automaton would be expensive. 


An SMT Solver for Regexes and String Length 297 


In cases where we cannot directly infer the length abstraction, the heuristic 
will fix a lower bound on the length of words in RE, and possibly an upper 
bound if it exists. Reasoning about the length abstraction early in the proce- 
dure gives our algorithm the opportunity to detect inconsistencies before expen- 
sive automaton operations are performed. This gives the arithmetic solver more 
opportunities to propagate facts discovered by refinement and potentially more 
chances to find inconsistencies or learn further derived facts. 


4.2 Optimizing Automata Operations via Length Information 


Similarly, computing the intersection J in line 11 is done lazily in the imple- 
mentation of Z3str3RE and over several iterations of the algorithm. The most 
expensive intersection operations can be performed at the end of the search, after 
as much other information as possible has been learned. We use the following 
heuristics recursively to estimate the “cost” of each operation without actually 
constructing any automata: 


— For a string constant, the estimated cost is the length of the string. 

— For a concatenation or a union of two regex terms X and Y, the estimated 
cost is the sum of the estimates for X and Y. 

— For a regex term X*, the estimated cost is twice the estimate for X. 

— For a regex term X under complement, the estimated cost is the product of 
the estimates obtained from subterms of X. 


In essence, the constructions which “blow up” the least are expected to be the 
least expensive and are performed first. In the best-case scenario, this could mean 
avoiding the most expensive operations completely if an intersection of smaller 
automata ends up being empty. In the worst case, all intersections are computed 
eventually, as this is necessary to maintain the soundness of our approach. 


4.3 Leveraging Length Information to Optimize Search 


Our implementation communicates integer assignments and lower /upper bounds 
with the external arithmetic solver in order to prune the search space. Check- 
ing for length assignments is done in practice as an abstraction-refinement loop 
involving Z3’s arithmetic solver. The arithmetic solver proposes a single candi- 
date model for the system of arithmetic constraints; the regex algorithm checks 
whether that model has a corresponding solution over the regex constraints. If it 
does not, it asserts a conflict clause blocking that combination of length assign- 
ments and regex constraints from being considered again. This is necessary in 
a DPLL(T)-style solver such as Z3 in order to handle Boolean structure in the 
input formula. 
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4.4 Prefix/Suffix Over-Approximation Heuristic 


As previously mentioned, computing automata intersections is expensive, but 
in many cases it is necessary in order to prove that a set of intersecting regex 
constraints has no solution. In some cases, this can be done “by inspection” from 
the syntax of the regex terms without constructing or intersecting any automata. 
From the structure of a regular expression, it is easy to determine the first 
letter of all possible accepted strings that it matches. If several regexes would be 
intersected over the same string term, this is used to check whether these regexes 
have a prefix of length one in common. If they do not, their intersection cannot 
contain any strings other than the empty string (and we can also check whether 
the empty string could be accepted by a similar syntactic approach). A similar 
construction for suffixes of length 1 is also used. In this way, the heuristic can 
infer that the intersection of several regex constraints is either empty, resulting in 
a conflict clause, or can only contain the empty string, resulting in a new fact and 
a simplification of the formula — without actually constructing the intersection 
or, in fact, constructing any automata for these regexes. 
For example, consider the following regex constraints on a variable X: 


X € (abc)* 
X €a |bt 


In the first constraint, the pattern abc is matched zero or more times, and could 
be empty; therefore, either X is empty or it must start with a and end with c. 
In the second constraint, each pattern is matched at least once, and cannot be 
empty; therefore X must start with a or b, end with a or b, and cannot be the 
empty string. Observe that according to the prefix heuristic, these constraints 
are consistent, since a is a valid prefix of both regexes; however, according to 
the suffix heuristic, they are inconsistent, as the possible suffixes a and b of the 
second regex do not include c, and the empty string is not a solution to both 
constraints. Hence these constraints are not jointly satisfiable. 

As demonstrated, all of these facts are derived from the syntax of the reg- 
ular expression without constructing any automata. By constructing an over- 
approximation of the possible solutions of X allowed by regex constraints, the 
heuristic can determine that their intersection is empty (or can only contain the 
empty string) without computing it precisely using expensive automata-based 
reasoning. We limit this heuristic to the first letter as each additional letter 
requires exponentially more space. 


5 Empirical Results 


In this section, we describe the empirical evaluation of Z3str3RE, our implemen- 
tation of the length-aware regular expression algorithm presented in Sect. 3, to 
validate the effectiveness of the techniques presented. We evaluate the correct- 
ness and efficiency of our tool against other solvers, as well as against different 
configurations of the tool in order to demonstrate the efficacy of our heuristics. 
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Fig. 2. Cactus plot summarizing performance on all benchmarks. Z3str3RE has the 
best overall performance. 


Table 1. Combined results of string solvers on all benchmarks. Z3str3RE has the 
best overall performance on all benchmarks compared to CVC4, OSTRICH, Z3seq, 
Z3str3, and Z3-trau and the biggest lead with a score of 1.02. 


CVC4 Z3Seq OSTRICH |Z3-Trau |Z3str3 Z3str3RE 
Sat 33310 31550 22499 24133 27563 33820 
Unsat 21897 21411 19281 21038 18566 22339 
Unknown 0 0 10901 6504 1164 291 
Timeout 2049 4295 4575 5581 9963 806 
Soundness error (0) 0 28 5325 13 0 
Program crashes 0 (0) 0 2477 2 0 
Total correct 55207 52961 41752 39846 46116 56159 
Contribution score 95.99 19.87 = — — 145.07 
Time (s) 57625.499|103487.844/305243.413|150288.386|213698.954/23339.266 
Time w/o timeouts (s)/16645.499|17587.844 |213743.413|38668.386 |14438.954 |7219.266 


5.1 Empirical Setup and Solvers Used 


We compare Z3str3RE against five other leading string solvers available today. 
CVC4 [24] is a general-purpose SMT solver which reasons about strings and regular 
expressions algebraically. Z3str3 [8] is the latest solver in the Z3-str family, and uses 
a reduction to word equations to reason about regular expressions. Z3str3RE is 
based on Z3str3 except for the length-aware algorithm and heuristics described in 
Sects. 3 and 4. Z3seq [36] is the Z3 sequence solver, implemented by Nikolaj Bjgrner 
and others at Microsoft Research, as part of the Z3 theorem prover. Z3seq uses a 
new theory of derivatives for solving extended regular expressions. Z3-Trau [1] is 
also based on Z3 and uses an automata-based approach known as “flat automata” 
with both under- and over-approximations. OSTRICH [15] uses a reduction from 
string functions (including word equations) to a model-checking problem that is 
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Fig. 3. Cactus plot summarizing detailed performance on Automatark benchmark. 


solved using the SLOTH tool and an implementation of IC3. We used CVC4’s 
binary version 1.8, commit 59e9c87 of Z3str3, the sequence solver included in Z3’s 
binary version 4.8.9, Z3-Trau commit 1628747, and OSTRICH version 1.0.1. All 
of these tools support the full SMT-LIB standard for strings. We did not compare 
against the Z3str2 [42] or Norn [3] solvers as neither tool supports the str.to_int 
or str.from_int terms which represent string-number conversion, which are used 
in some sanitizer benchmarks. Additionally, Norn does not support many of the 
other high-level string terms such as indexof or substr which are used in the 
benchmarks. The ABC [4] solver handles string and length constraints by conver- 
sion to automata. However, their method over-approximates the solution set of 
the input formula which may be unsound. Thus, we excluded ABC from our eval- 
uation. We also were unable to evaluate against Trau [2] as the provided source 
code did not compile. All evaluations were performed on a server running Ubuntu 
18.04.4 LTS with two AMD EPYC 7742 processors and 2TB of memory using the 
ZaligVinder [23] benchmarking framework. A 20s timeout was used. We cross- 
verified the models generated by each solver for satisfiable instances against all 
competing solvers. 


5.2 Benchmarks 


The comparison was performed on four suites of regex-based benchmarks with a 
total of 57256 instances. In total, almost 75% of the instances in our evaluation 
came from previously published industrial benchmarks or other solver devel- 
opers. Under 10% contain extended regular expressions (having either comple- 
ment or intersection, or both) and 53% contain only regex predicates. Only 201 
instances fall into the undecidable theory TLRE,n,& More details can be found in 
[7] where we analyse the benchmarks in greater detail. We briefly describe each 
benchmark’s origin and composition. 
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Table 2. Detailed results for the Automatark benchmark. Z3str3RE has the biggest 
lead with a score of 1.01. 


CVvC4 Z3Seq OSTRICH | Z3-Trau Z3str3 Z3str3RE 
Sat 14376 14204 11461 8157 9151 14437 
Unsat 5304 5290 5381 3817 4385 5422 
Unknown 1 (0) 15 5045 406 (0) 
Timeout 298 485 3122 2960 6037 120 
Soundness error (0) (0) (0) 1300 0 (0) 
Program crashes 0 (0) (0) 1063 2 (0) 
Total correct 19680 19494 16842 10674 13536 19859 
Contribution score 1.0 1.0 2.0 = 0.0 0.5 
Time (s) 8789.425 | 18718.425 | 158910.126 | 80021.352 | 126825.967 | 3925.150 
Time w/o timeouts (s) 2829.425 | 9018.425 | 96470.126 | 20821.352 | 6085.967 1525.150 


AutomatArk is a set of 19979 benchmarks based on a collection of real-world 
regex queries collected by Loris D’Antoni from the University of Wisconsin, 
Madison, USA. We translated the provided regexes [16] into SMT-LIB syntax 
resulting in two sets of instances: a “simple” set with a single regex membership 
predicate per instance, and a “complex” set with 2-5 regex membership predi- 
cates (possibly negated) over a single variable per instance. The instances in this 
benchmark are evenly divided between simple and complex problems. 


RegEx-Collected is a set of 22425 instances taken from existing benchmarks 
with the purpose of evaluating the performance of solvers against real-world 
regex instances. This benchmark includes all instances from the AppScan [41], 
BanditFuzz,°> JOACO [38], Kaluza [33], Norn [3], Sloth [21], Stranger [40], and 
Z3str3-regression [8] benchmarks in which at least one regex membership con- 
straint appears. No additional restrictions are placed on which instances were 
chosen besides the presence of at least one regex membership predicate. This 
benchmark tests solvers against challenging instances from widely distributed 
benchmark suites. Additionally, these instances may contain regex terms in any 
context and with any other supported string operators. As a result, the bench- 
mark is also exemplary of how string solvers perform in the presence of operations 
and predicates that are relevant to program analysis. 


StringFuzz-regex-generated is a set of 4170 problems generated by the 
StringFuzz string instance fuzzing tool [12]. These instances only contain regular 
expression and linear arithmetic constraints. This benchmark isolates the regex 
performance of a string solver in the context of mixed regex and arithmetic con- 
straints. Tools with better regex and arithmetic solvers should perform better. 
Fuzz testing, as performed in the StringFuzz-regex-generated benchmark, 
has been shown to be extremely productive in discovering bugs and performance 


5 The BanditFuzz benchmark is an unpublished suite obtained via private communi- 
cation with the authors. 

6 Other benchmark suites available to us, including the PyEx, PISA, and Kausler 
benchmarks, did not include any regex membership constraints. 
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Fig. 4. Cactus plot showing detailed results for the StringFuzz-regex-generated bench- 
mark. 


Table 3. Detailed results for the StringFuzz-regex-generated benchmark. Z3str3RE 
has the biggest lead with a score of 1.25. 


CVC4 Z3Seq OSTRICH | Z3-Trau_ | Z3str3 Z3str3RE 
Sat 2316 2001 2005 1590 3227 3231 
Unsat 442 697 819 824 32 830 
Unknown (0) (0 1 192 (0 (0) 
Timeout 1412 1472 1345 1564 911 109 
Soundness error (0) (0) (0) 8 0 (0) 
Program crashes 0 (0) (0) 192 (0) (0) 
Total correct 2758 2698 2824 2406 3259 4061 
Contribution score 0.0 3.17 2.0 = 0.0 0.17 
Time (s) 31236.207 | 35409.000 | 51571.800 | 37323.550 | 22031.636 | 5116.456 
Time w/o timeouts (s) | 2996.207 | 5969.000 | 24671.800 | 6043.550 | 3811.636 | 2936.456 


issues in SMT solvers. We included these instances because they exercise the per- 
formance of the solver on regex-heavy constraints in a way that the industrial 
benchmarks or instances obtained from other solver developers cannot. 


StringFuzz-regex-transformed is a set of 10682 instances which were pro- 
duced by transforming existing industrial instances with StringFuzz. We applied 
StringFuzz’s transformers to instances supplied by Amazon Web Services related 
to security policy validation, handcrafted instances inspired by real-world input 
validation vulnerabilities, and the regex test cases in Z3str3’s regression test 
suite. The instances contain regex constraints, arithmetic and length constraints, 
string-number conversion (numstr), string concatenation, word equations, and 
other high-level string operations such as charAt, indexof, and substr. As is 
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Fig. 5. Cactus plot showing detailed results for the StringFuzz-regex-transformed 
benchmark. 


Table 4. Detailed results for the StringFuzz-regex-transformed benchmark. Z3str3RE 
has the biggest lead with a score of 1.0. 


CVvC4 Z3Seq OSTRICH | Z3-Trau | Z3str3 Z3str3RE 
Sat 4541 4633 3899 3672 4417 4599 
Unsat 6016 5976 4549 6282 4817 6037 
Unknown (0) (0) 2233 721 0 6 
Timeout 125 73 1 iG 1448 40 
Soundness error (0) (0) 5 1241 (0) (0) 
Program crashes (0) (0) 0 718 (0) 0 
Total correct 10557 10609 8443 8713 9234 10636 
Contribution score 0.5 0.0 5 = 0.0 4.83 
Time (s) 2969.643 | 2066.935 | 23094.737 | 722.545 | 29788.245 | 1095.209 
Time w/o timeouts (s) | 469.643 | 606.935 | 23074.737 | 582.545 | 828.245 295.209 


typical for fuzzing in software testing, the goal is to create a suite of tests from a 
given input that are similar in structure but that explore interesting behaviour 
not captured by a “typical” industrial instance. These transformed instances are 
often harder than the original industrial ones. 


5.3 Comparison and Scoring Methods 


We compare solvers directly against the total number of correctly solved cases, 
total time with and without timeouts, and total number of soundness errors and 
program crashes. We also computed the biggest lead winner and largest contri- 
bution ranking following the scoring system used by the SMT Competition [6]. 
Briefly, the biggest lead measures the proportion of correct answers of the lead- 
ing tool to correct answers of the next ranking tool, and the contribution score 
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measures what proportion of instances were solved the fastest by that solver. 
In accordance with the SMT Competition guidelines, a solver receives no con- 
tribution score (denoted as —) if it produces any incorrect answers on a given 
benchmark. In both cases, higher scores are better. 


5.4 Analysis of Empirical Results 


The cactus plot in Fig. 2 shows the cumulative time taken by each solver on all 
cases in increasing order of runtime. Solvers that are further to the right and 
closer to the bottom of the plot have better performance. 

Overall Z3str3RE solves more instances and performs better than all com- 
peting solvers. Across all benchmarks, Z3str3RE is over 2.4x faster than CVC4, 
4.4x faster than Z3seq, 6.4x faster than Z3-Trau, 9.1x faster than Z3str3, and 
13x faster than OSTRICH (including timeouts). Additionally, Z3str3RE has 
fewer combined timeouts and unknowns than other tools considered, and no 
soundness errors or crashes. We summarize these results in Table 1. Notably, 
both Z3-Trau [1] and OSTRICH [15] had significant runtime issues in our exper- 
iments. Z3-Trau produced 5325 soundness errors and 2477 crashes on our bench- 
marks (13% of all instances), which is significantly higher than other tools used. 
OSTRICH produced 10901 “unknown” responses on the benchmarks (19% of all 
instances), due to both unsupported features and crashes, and also produced 28 
soundness errors. Over all benchmarks, Z3str3RE produced 291 unknowns. There 
are several potential reasons for this; the solver may have encountered a resource 
limit and returned UNKNOWN, or it may have detected non-termination and 
returned UNKNOWN instead of looping forever. According to SMT Competition 
scoring, Z3str3RE won the division across all benchmarks with a lead of 1.02, 
and had the largest contribution to the division with a score of 145.07. CVC4 
had a contribution score of 95.99, and Z3seq had a score of 19.87. OSTRICH, Z3- 
Trau, and Z3str3 received no contribution score as they each returned at least 
one incorrect answer. The presented results are typical of the performance of 
the evaluated tools over multiple runs. Results were cross-validated within runs 
and between multiple runs. For a random single instance, the sample variance 
in execution time for 100 runs is 0.001 (0.07% of average execution time). Over 
57256 instances, this is negligible. 

The empirical results make clear the efficacy of length-aware automata-based 
techniques for regular expression constraints when accompanied with length con- 
straints (which is typical for industrial instances). The effectiveness of our tech- 
nique is demonstrated particularly by comparing Z3str3RE with Z3str3, as the 
only differences between these tools are the length-aware regex algorithm and 
heuristics implemented in Z3str3RE and bug fixes. By improving the regex algo- 
rithm and applying our heuristics, we achieved a speedup of over 9x and solved 
over 10000 more cases than Z3str3. 
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Fig. 6. Cactus plot showing detailed performance for the RegEx-Collected benchmark. 


Table 5. Detailed results for the RegEx-Collected benchmark. CVC4 has the biggest 
lead with a score of 1.03. 


CVC4 Z3Seq OSTRICH | Z3-Trau Z3str3 Z3str3RE 
Sat 12077 10712 5134 10714 10768 11553 
Unsat 10135 9448 8532 10115 9332 10050 
Unknown (0) (0) 8652 546 758 285 
Timeout 213 2265 107 1050 1567 537 
Soundness error (0) 0 23 2776 13 (0) 
Program crashes (0) (0) (0) 504 0 10) 
Total correct 22212 20160 13643 18053 20087 21603 
Contribution score 91.06 3.51 S = = 14.54 
Time (s) 14610.224 | 47293.484 | 71666.750 | 32220.939 | 35053.106 | 13202.451 
Time w/o timeouts (s) | 10350.224 | 1993.484 | 69526.750 | 11220.939 | 3713.106 | 2462.451 


5.5 Detailed Experimental Results 


Figure 3 and Table 2 show the detailed results for the Automat Ark benchmark. 
In this benchmark, Z3str3RE solves more instances than all other solvers, has the 
fewest timeouts/unknowns, and has the fastest overall running time. Including 
timeouts, Z3str3RE is 2.2x faster than CVC4, 4.7x faster than Z3seq, 40.4x 
faster than OSTRICH, 20.4x faster than Z3-Trau, and 32.3x faster than Z3str3. 

Figure 4 and Table 3 show the detailed results for the StringFuzz-regex- 
generated benchmark. Z3str3RE solves more instances than all other solvers, 
has over 90% fewer timeouts than other solvers, no unknowns, and has the fastest 
overall running time. Including timeouts, Z3str3RE is 6.1x faster than CVC4, 
6.9x faster than Z3seq, 10x faster than OSTRICH, 7.3x faster than Z3-Trau, 


and 4.3x faster than Z3str3. 
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Fig. 7. Cactus plot comparing performance by disabling individual heuristics on all 
benchmarks. 


Figure5 and Table 4 show the detailed results for the StringFuzz-regex- 
transformed benchmark. Z3str3RE solves more instances in total than all other 
solvers and has the lowest total running time without timeouts. Including time- 
outs, Z3str3RE is 2.7x faster than CVC4, 1.9x faster than Z3seq, 21x faster 
than OSTRICH, and 27x faster than Z3str3. Although Z3-Trau is 1.5x faster 
than Z3str3RE on this benchmark, including timeouts, Z3-Trau also produces 
1241 answers with soundness errors and crashes on 718 other cases. Z3str3RE 
produces no wrong answers or soundness errors on the benchmark. Z3-Trau also 
solves 1923 fewer cases correctly in total than Z3str3RE. 

Figure6 and Table5 show the detailed results for the RegEx-Collected 
benchmark. Z3str3RE outperforms Z3seq, Z3str3, OSTRICH, and Z3-Trau on 
this benchmark and is competitive with CVC4 both in terms of total number 
of instances correctly solved and total running time. CVC4 solves 609 more 
instances than Z3str3RE on this benchmark, but Z3str3RE is 1.1x faster over- 
all (including timeouts). Z3str3RE is 3.6x faster than Z3seq, 5.4x faster than 
OSTRICH, 2.4x faster than Z3-Trau, and 2.6x faster than Z3str3. 


5.6 Analysis of Individual Heuristics and Results 


To demonstrate the effectiveness of individual heuristics described in Sect. 4 and 
implemented in Z3str3RE, we evaluated different configurations of the tool in 
which one or more heuristics were disabled. Figure 7 and Table 6 show the results. 
The plot line “Z3str3RE” shows the performance of the tool with all heuristics 
enabled. The plot line “All heuristics off” shows the performance with all heuris- 
tics disabled. Each of the other plot lines shows the performance with the named 
heuristic disabled and all others kept enabled. From the plots and table, it is 
clear that Z3str3RE performs best with all heuristics enabled. Z3str3RE is 4.4x 
faster using all our heuristics than using none. Every other configuration of the 
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Table 6. Comparison of different heuristics in Z3str3RE on all benchmarks. 


All off Lazy Prefix/suffix | Automata | Arith. Z3str3RE 

intersection | off length solver 

off info off integ. off 
Sat 31046 31486 33817 33816 33804 33820 
Unsat 22090 22085 21880 22264 22131 22339 
Unknown 313 323 287 285 283 291 
Timeout 3807 3362 1272 891 1038 806 
Soundness error (0) 0 (0) (0) (0) (0) 
Program crashes 42 39 (0) 1 (0) (0) 
Total correct 53136 53571 55697 56080 55935 56159 
Time (s) 102102.388|101799.263 |40068.501 27178.746 |30006.857 | 23339.266 
Time w/o timeouts (s) /25962.388 |34559.263 1462.8501 9358.746 |9246.857 |'7219.266 


tool performs significantly worse relative to the one with all heuristics enabled. 
Also, the length-aware and prefix/suffix heuristics provide significant boost over 
lazy intersections and the baseline. These results demonstrate empirically that 
each heuristic we introduce provides significant benefit in both total number of 
solved instances and total solver runtime, and that all of the heuristics can be 
used simultaneously for maximum efficacy. 


6 Related Work 


Comparison with Z3str3: Z3str3 [8] supports regex constraints via (incom- 
plete) reduction to word equations. We have replaced this word-based technique 
with our automata-based approach introduced in this paper. As demonstrated 
by our evaluation, the length-aware automata-based approach used in Z3str3RE 
is more efficient at solving these constraints, and is sound and complete for the 
QF theory TLRE- 


Comparison with Z3’s Sequence Solver: Z3’s sequence solver [18] supports 
a more general theory of “sequences” over arbitrary datatypes, which allows 
it to be used as a string solver. Z3seq uses regular expression derivatives to 
reduce regex constraints without constructing automata. The experiments show 
Z3str3RE performs better than Z3seq overall. 


Comparison with CVC4: The CVC4 solver [24] uses an algebraic approach 
to solving regex constraints. As shown in the experiments, Z3str3RE performs 
better than CVC4, widely considered as one of the best SMT solvers for strings 
as well as many other theories. 


Comparison with Z3-Trau: The Z3-Trau [1] solver builds on Trau [2], re- 
implemented in Z3, and enriched with new ideas e.g. a more efficient handling 
of string-number conversion. The evaluation of Z3-Trau exposed 5325 soundness 
errors and 2477 crashes on our benchmarks. 
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Comparison with OSTRICH: The OSTRICH solver [15] implements a reduc- 
tion from straight-line and acyclic fragments of an input formula to the emptiness 
problem of alternating finite automata. OSTRICH produced 10901 “unknown” 
responses and 4575 timeouts on our benchmarks, as well as 28 soundness errors. 


Related Algorithms and Theoretical Results: The theory of word equa- 
tions and various extensions have been studied extensively for many decades. 
In 1977, Makanin proved that satisfiability for the QF theory of word equations 
is decidable [28]; in 1999, Plandowski showed that this is in PSPACE [30,31]. 
Schulz [34] extended Makanin’s algorithm to word equations with regex con- 
straints. The satisfiability problem for the theory of word equations with length 
constraints still remains open [20, 28, 29,31], although the status of many other 
extensions of this theory was clarified [17]. Automata-based approaches were 
used to reason about string constraints enhanced with a ReplaceAll function 
[14] or transducers [21]. 

Liang et al. [25] present a formal calculus for a theory that extends Tyre 
with string concatenation (but not word equations). However, in that paper the 
authors do not present experimental results regarding implementation of the 
string calculus proposed. We have implemented an algorithm based on funda- 
mentals of the theory and standard automata-based constructions, and presented 
a thorough experimental evaluation of our implementation. 

Abdulla et al. [3] present an automata-based solver called Norn built upon 
results involving construction of length constraints from regex constraints. This 
approach differs significantly from our method. In particular, Norn only uses 
automata in inferring length constraints implied by regular expressions, then uses 
an algebraic approach to solve the remainder of the formula. By contrast, our tool 
uses a hybrid approach that includes both algebraic solving and automata-based 
reasoning in a symbiotic loop. In addition, we present several novel heuristics 
using length information to guide the search and, in some cases, avoid construct- 
ing automata or computing intersections. 

The prefix/suffix over-approximation heuristic is inspired partly by the work 
of Brzozowski on regex derivatives [13]. The heuristic we introduce is conceptu- 
ally different as we examine possible prefixes (and suffixes) of strings that could 
be accepted by a regex in order to demonstrate unsatisfiability, rather than 
examining the set of all possible suffixes given a fixed prefix in order to demon- 
strate satisfiability. Our heuristic computes suffixes as well, whereas Brzozowski 
derivatives are traditionally computed with respect to prefixes of a string. Newer 
versions of Z3seq, including the one we evaluated, use a regex algorithm based 
on symbolic derivatives [36]. 


7 Conclusions and Future Work 


In this paper, we empirically showcase the power of length-aware and pre- 
fix/suffix reasoning for regex constraints with our algorithm and its implementa- 
tion in Z3str3RE via an extensive empirical comparison against five other state- 
of-the-art solvers (namely, CVC4, Z3seq, Z3str3, Z3-Trau, and OSTRICH) over 
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a large and diverse benchmark of 57256 instances. Over this entire benchmark 
suite, we show that Z3str3RE has a speedup of 2.4 over CVC4, 4.4x over Z3seq, 
6.4x over Z3-Trau, 9.1x over Z3str3, and 13x over OSTRICH. Our length-aware 
method is very general and has wide applicability in the broad context of string 
solving. In the future, we plan to explore further length-aware heuristics which 
include more expressive functions and predicates, including indexof, substr, 
and string-number conversion. 
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Abstract. Given an unsatisfiable Boolean formula F in CNF, an unsat- 
isfiable subset of clauses U of F is called Minimal Unsatisfiable Subset 
(MUS) if every proper subset of U is satisfiable. Since MUSes serve as 
explanations for the unsatisfiability of F, MUSes find applications in a 
wide variety of domains. The availability of efficient SAT solvers has 
aided the development of scalable techniques for finding and enumerat- 
ing MUSes in the past two decades. Building on the recent developments 
in the design of scalable model counting techniques for SAT, Bendík and 
Meel initiated the study of MUS counting techniques. They succeeded 
in designing the first approximate MUS counter, AMUSIC, that does not 
rely on exhaustive MUS enumeration. AMUSIC, however, suffers from 
two shortcomings: the lack of exact estimates and limited scalability due 
to its reliance on 3-QBF solvers. 

In this work, we address the two shortcomings of AMUSIC by design- 
ing the first exact MUS counter, CountMUST, that does not rely on 
exhaustive enumeration. CountMUST circumvents the need for 3-QBF 
solvers by reducing the problem of MUS counting to projected model 
counting. While projected model counting is #NP-hard, the past few 
years have witnessed the development of scalable projected model coun- 
ters. An extensive empirical evaluation demonstrates that CountMUST 
successfully returns MUS count for 1500 instances while AMUSIC and 
enumeration-based techniques could only handle up to 833 instances. 


1 Introduction 


Boolean formulas serve as a primary representation language to model the 
behaviour of systems and properties. Given an unsatisfiable Boolean formula 
F in Conjunctive Normal Form (CNF), i.e. a set of clauses F = { fi, fo,..., fn}, 
a subset U C F is called Minimal Unsatisfiable Subset (MUS) of F iff U is 
unsatisfiable and for every f € U, U \ {f} is satisfiable. 

MUsSes serve as explanations or reasons for unsatisfiability of F, and have, 
consequently, found applications in a wide variety of domains such as diagno- 
sis [24,56], constrained sampling and counting [28], equivalence checking [20], 
and the like [1,2,25,30,47,64]. While the early applications relied on identify- 
ing a single [3,6,7,51,53] or enumerating multiple [4,10,12,39,41,52] MUSes, 
the rapid adoption of MUSes lead researchers to investigate problem formula- 
tions and their corresponding applications that do not rely on explicit MUS 
© The Author(s) 2021 
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identification. These include, e.g., computing the union of all MUSes [45], decid- 
ing whether a given clause belongs to an MUS [31], or counting the number 
of MUSes. Especially, the counting of MUSes found many applications in the 
domain of diagnosis where the MUS count can be used to compute various incon- 
sistency metrics [25,29,48-50, 65] for general propositional knowledge bases. 

A straightforward, and for many years the only available, approach for count- 
ing MUSes is to simply enumerate them. However, there can be up to exponen- 
tially many MUSes w.r.t. |F| and hence the complete enumeration is often practi- 
cally intractable [9, 10,39, 69]. Inspired by the development of model counting tech- 
niques in the context of SAT, which in its nascent stages also depended on complete 
model enumeration while contemporary techniques often need to explicitly iden- 
tify just a fraction of models, Bendik and Meel [13] recently initiated an investiga- 
tion of counting MUSes without their explicit enumeration. In this context, they 
succeeded by developing a hashing-based approximate counter, AMUSIC [13], that 
provides the so-called PAC guarantees, also known as (e, ô)-guarantees, wherein 
the computed answer is within the (1+ ¢)-factor of the exact count with confidence 
at least 1 — 6. AMUSIC reduces the problem of MUS counting to logarithmically 
many calls to a XË oracle (3-QBF solver, in practice) wherein every XË query is 
constructed over a CNF formula conjuncted with XORs. 

While AMUSIC achieved its stated goal of avoiding explicit enumeration, 
its scalability is significantly hampered by its reliance on a 3-QBF solver that 
can efficiently handle formulas conjuncted with XOR constraints. It is worth 
highlighting that the scalability of model counting techniques [17,60] in the 
context of SAT crucially relies on the availability of CryptoMiniSAT [61], a 
SAT solver with native support for CNF-XOR constraints. Despite significant 
advances in QBF solving over the years, the scalability remains a formidable 
challenge for 3-QBF solvers, and even more when XOR constraints are involved. 
As such, AMUSIC could scale to formulas involving few hundreds of variables 
and clauses. 

In this work, we focus on addressing the scalability of MUS counting tech- 
niques. We begin our investigation by focusing on the observation of Bendík and 
Meel that their technique relied on a XË oracle even though the problem of find- 
ing an MUS is in FP’? [19,44]. Therefore, a natural direction is to investigate 
the design of an algorithmic framework that can circumvent reliance on oracles 
with high complexity. In this context, we rely on the observation of Durand, Her- 
mann, and Koliatis [21] that the complexity of counting problems whose search 
problems have FPNP complexity tend to be #NP (which contains #P class). 
Such an observation is timely given the recent surge of interest in designing effi- 
cient techniques for projected model counting, which is ##NP-hard. Therefore, 
one wonders: whether it is possible to design a MUS counting technique that can 
take advantage of projected model counters? 

The primary contribution of this paper is an affirmative answer to the above 
question. We design a new algorithmic framework, CountMUST, that reduces 
the problem of MUS counting to two projected model counting queries. In par- 
ticular, CountMUST constructs a wrapper W and its remainder R such that the 
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number of MUSes of F is |W|— [R], i.e., the wrapper W over-approximates the 
set of MUSes while the remainder contains the spurious, non-MUS, subsets of 
F that emerge due to the over-approximation. We encode the wrapper W and 
the remainder R with Boolean formulas W and R such that the projected model 
counts for W and R (for a suitable projection set) equal to |W| and |R], respec- 
tively. An interesting (and perhaps surprising) aspect of our CountMUST is that 
we do not enumerate a single MUS in our process, which is in stark contrast 
to the design of AMUSIC that relies on the enumeration of a small number of 
MUSes. 

We discuss several strategies to construct wrappers (and their corresponding 
remainders) that are efficient to compute and are tight over-approximations of 
the set of MUSes. We conduct a detailed empirical analysis over 2553 instances 
and observe that CountMUST successfully returns MUS count for 1500 instances 
while AMUSIC and enumeration-based techniques could only handle up to 833 
instances. We observe interesting complementary nature of the exact and approx- 
imate MUS counting approaches: the scalability of AMUSIC is often impacted by 
the number of clauses and appears to be less impacted by the number of MUSes 
while, on the other hand, the scalability of CountMUST is less impacted by the 
number of clauses and appears to depend on the number of MUSes. 

Finally, our empirical analysis showcases that our wrappers W approximate 
the set of MUSes very tightly. Motivated by the tightness of our wrappers, we dis- 
cuss several interesting applications of our framework: approximate MUS count- 
ing [13], MUS enumeration [5,40], MUS Sampling, estimation of minimum and 
maximum MUS cardinality [27,38], and MUS membership testing [31]. 

The rest of the paper is organized as follows. We introduce preliminaries in 
Sect. 2 and discuss related work in Sect. 3. We then present the primary technical 
contribution of our work in Sect. 4. We present the empirical evaluation in Sect. 5 
and then discuss the implications of the tightness of our wrappers in Sect. 6. We 
finally conclude in Sect. 7. 


2 Preliminaries and Problem Definition 


A Boolean formula F is built over Boolean values {1,0} and over a set Vars(F) 
of Boolean variables connected via standard logical operators: A, V, >, œ, 7. A 
literal is either a variable x € Vars(F) or its negation ~g; Lits(F) denotes the 
set of all literals used in F. Given a set A of variables, a valuation 7: A — {1,0} 
assigns to each variable its Boolean value. F'[z] denotes the formula that emerges 
from F by substituting every variable x of F that is in the domain of m by x(a); 
furthermore, trivial simplifications, e.g.,GV0=G, GA0=0, 71 = 0, 7-0 = 1, 
are applied. Note that if A 2 Vars(F), then F'[z] is simplified either to 1 or to 0. 
In the case when A D Vars(F) and F[z] = 1, we call 7 a model of F and write 
m = F; otherwise, when F[z] = 0, we write m 4 F. A formula F is satisfiable if 
it has a model; otherwise, F is unsatisfiable. We write Mp to denote the set of all 
models of F. Moreover, given a set A C Vars(F) of variables, we write Mr, to 
denote the projection of Mp on A, and for every 7 E€ Mp, we write 7 4 to denote 


316 J. Bendík and K. S. Meel 


the projection of m on A. Finally, given two variable sets, A = {a1,..., ap} and 
B = {by,...,b¢}, such that A C Vars(F), we write F[4/,) to denote the formula 
that originates from F by substituting each variable a; € A by b; € B. 

A formula in conjunctive normal form, shortly a CNF formula, is a conjunc- 
tion of clauses where a clause is a disjunction of literals. When suitable, a CNF 
formula can also be viewed as a multiset of clauses where a clause is a set of 
literals; we use the two representations interchangeably based on the context. 
Throughout the whole text, let us by F = {f1,..., fn} denote the input CNF 
formula of interest. Furthermore, capital letters, e.g., S,K,N, or blackboard 
bold letters, e.g., W, R, are used to denote other formulas, small letters, e.g., 
f, fi, fi, are used to denote clauses, and small letters, e.g., x, x',y, are used to 
denote variables. Finally, given a set X, P(X) denotes the power-set of X, and 
|X| denotes the cardinality of X. 


Definition 1 (MUS). A subset N of F is a minimal unsatisfiable subset 
(MUS) of F iff N is unsatisfiable and for every f E€ N it holds that N \ {f} is 
satisfiable. 


Definition 2 (MSS). A subset N of F is a maximal satisfiable subset (MSS) 
of F iff N is satisfiable and for every f E€ F\ N it holds that N U {f} is 
unsatisfiable. 


Definition 3 (MCS). A subset N of F is a minimal correction subset (MCS) 
of F iff F \ N is satisfiable and for every f E N it holds that F \ (N \ {f}) is 
unsatisfiable. Equivalently, N is an MCS iff F\ N is an MSS. 


Note that the Boolean satisfiability is monotone w.r.t. the (clause) subset 
inclusion, i.e., all subsets of a satisfiable set of clauses are satisfiable. Con- 
sequently, all proper subsets of an MUS are in fact satisfiable, and, dually, 
all proper supersets of an MSS are unsatisfiable. Also, note that the mini- 
mality/maximality concept used here is a set minimality/mazimality and not 
a minimum/mazimum cardinality. Consequently, there can be up to ( nue 
MUSes/MCSes/MSSes of F (intuitively, this is the number of pair-wise incom- 
parable subsets of F; see the Sperner’s theorem [62]). We write maximum and 
minimum MUS to denote an MUS with the maximum and the minimum cardi- 
nality, respectively. Note that there can also be exponentially many maximum 
and minimum MUSes. We write MUS; to denote the set of all MUSes of F, and 
SSp to denote the set of all satisfiable subsets of F. 


Example 1. Let us demonstrate the concepts of MUSes, MSSes and MCses on 
an example. Assume that F = {fi = {ai}, fo = {701}, fg = {z2}, fa = 
{na1,7%2}}. There are 2 MUSes: MUSp = {{f1, f3, fa}, (fi, fot}, 3 MSSes: 
{{ fo, fs, fa}, (fi, fa}, th, fa}, and thus also 3 MCSes: {{ fi}, {f2, fa}, {fa, Fath- 


For illustration, see Fig. 1. 


In this paper, we are concerned with the following two problems. 
Name: #MUS 
Input: A CNF formula F. 
Output: The number |MUS;| of MUSes of F. 


Counting Minimal Unsatisfiable Subsets 317 
Ann 


x D 


LL 
Cah E CS Tew 


Fig. 1. Illustration of P(F) from the Example 1. Individual subsets are represented 
as bit-vectors, e.g., { f1, f2} is written as 1100. The subsets with a dashed border are 
the unsatisfiable subsets, and the others are satisfiable subsets. MUSes and MSSes are 
filled with a background colour. 


Name: proj-#SAT 
Input: A formula G and a set of variables S C Vars(G). 
Output: The number |Mg]s| of models of G projected on S. 


Our goal is to solve the #MUS problem, and to do that, we propose a strong 
subtractive reduction to the proj-#SAT problem. 


Definition 4 (Strong Subtractive Reductions). /21] Let X be an alphabet 
and let Qı and Q2 be two binary relations over X. Let ##-Q and #-Q2 represent 
the corresponding counting problems. Then, #:-Q, reduces to #- Qə via a strong 
subtractive reduction, if there exist polynomial-time computable functions f and 
g such that for every string z E€ X*: 


1. Q2(F(2)) € Q2(g(2)) 
2. |Qi(z)| = |Q2(9(z))| — 1Q2(F(2))I- 


3 Related Work 


MUS Counting. A straight-forward approach to count the MUSes is to simply 
enumerate them via an MUS enumeration algorithm, e.g. [4,5,8,10,12,39, 41,52]. 
However, since there can be up to exponentially many MUSes w.r.t. |F|, the 
complete enumeration is often practically intractable. An alternative approach 
to identify the MUS count is based on a so-called minimal hitting set duality 
between MUSes and MCSes that states that every MUS is a minimal hitting set 
of the set of all MCSes [32,56]. Consequently, one can determine the MUS count 
by first identifying all MCSes and then counting their minimal hitting sets [40]. 
However, there can be in general up to exponentially many MCSes, which makes 
this approach also often practically intractable [11,52]. 

The study of MUS counting without relying on exhaustive enumeration 
was initiated just recently by Bendík and Meel [13], who proposed an (e,6)- 
approximation scheme called AMUSIC. AMUSIC extends a prior hashing-based 
model counting framework [15, 18,63] to MUS counting. Briefly, AMUSIC divides 
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the power-set P(F’) into nCells small cells, then pick one of the cells and count 
the number inCell of MUSes in the cell, and estimate the overall MUS count 
as nCells x inCell. The approach requires to perform logarithmically many calls 
to a XÈ? oracle (3-QBF solver) wherein each query consists of a CNF formula 
conjuncted with XOR constraints. The lack of solvers with native support for 
such constraints presents the major hindrance to the scalability of AMUSIC. 

It is worth remarking on a recent work by Bendík and Meel [14] that focuses 
on exact counting of maximal satisfiable subsets (MSSes). While MUSes and 
MSSes are closely related concepts, to the best of our knowledge, there does not 
exist any efficient reduction from MUS counting to MSS counting, or vice versa. 
Note that the best known upper-bound on the problem of finding an MUS is 
FPP [19], whereas for findind an MSS a tighter upper-bound FPN? (wit, log] 
is known [44], which suggests that counting MUSes is practically harder than 
counting MSSes. It would be an interesting question for future work if the counter 
developed in this work can be employed to perform MSS counting. 


Model Counting. The complexity-theoretic study of model counting was initiated 
by Valiant [67] who showed that proj-#SAT is #P-complete when S = Vars(G). 
Subsequently, Durand, Hermann, and Koliatis [21] showed that the general prob- 
lem of proj-#SAT is ##NP-hard. A significant conceptual contribution of Durand 
et al. was to show the importance of subtractive reductions for problems in #NP; 
this idea has been applied for reductions to projecting counting [14]. 

Our work relies on the recent progress in the development of efficient pro- 
jected model counters; in particular, we employ GANAK [59], a state-of-the-art 
search-based exact model counter; the entry based on GANAK won the projected 
model counting track in 2020 Model Counting Competition [23]. Search-based 
model counters build on three core ideas: (1) for a formula G and x € S, we 
have |Mas| = |Me(a+0)|5|+|Me(a1) 151, (2) if G can be partitioned into sub- 
set of clauses {C1,C2,...C;,} such that Vi, j. Vars(C;) N Vars(C;) = Ø, then we 
have |Mg,s| = i |Mo,s|, and (3) finally, component caching is employed to 
cache the components. Consequently, the model count can be often determined 
by explicitly identifying just a fraction of all models. GANAK is built on top of 
earlier search-based model counters, sharpSAT [66] and Cachet [57,58]. 


4 MUS Counting via a Projected Model Counter 


We now gradually introduce several subtractive reductions of the MUS counting 
problem to the projected model counting, starting with the base idea in Sect. 4.1, 
and following with the particular reductions in Sects. 4.2—4.11. 


4.1 Basic MUS Counting Idea 


Definition 5 (wrapper and remainder). A set W of subsets of F is a 
wrapper iff MUSp C W C MUSp U SSp. Furthermore, the remainder of W is the 
setR = WI SSF. 
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Proposition 1. Let W be a wrapper and R its corresponding remainder. Then 
IMUS;-| = |W] = IR]: 


Proof. Since R = WN SSp, then MUSp N R =O, and hence |W| = |MUSF| + IR]. 


Our approach to determine the MUS count |MUS F| consists of the following 
steps. First, we define a wrapper W and its corresponding remainder R. Subse- 
quently, we encode the wrapper W with a Boolean formula W such that each 
projected model of W (for a suitable projection set) corresponds to an element 
of W. Similarly, we construct a Boolean formula R such that each projected 
model of R corresponds to an element of the remainder R. Finally, we employ a 
projected model counter to determine the projected model counts of W and R, 
i.e., |W]| and |R|, and hence we obtain the MUS count |MUSpr| = |W] — |R]. 

In the following, we first describe in Sect. 4.2 how to build a simple wrapper 
W, and its remainder Rı and how to encode them via Boolean formulas W; and 
Rj, respectively. Subsequently, in Sects. 4.3-4.11, we propose several additional 
wrappers (and their remainders) that improve upon the base wrapper W; by 
exploiting various observations about MUSes. Finally, in Sect.4.12, we show 
how to combine the individual wrappers. 


4.2 W - the Base Wrapper and Its Reminder 


Our base wrapper, W1, is simply the set of all satisfiable subsets and all MUSes 
of F, i.e., Wi = SSp UMUSp. The corresponding remainder Rı is thus the set 
SSp of all satisfiable subsets of F. In the following, we describe how to encode 
the wrapper W, and the remainder R, via Boolean formulas W, and R, whose 
projected models correspond to elements of W, and R1, respectively. 

Let us start with encoding the remainder R; = SS. Given the unsatisfiable 
formula F = {fi,..., fn}, we introduce a set A = {aj,...,a,} of activation 
variables. Note that every valuation m of A one-to-one maps to an activated 
subset 74 7 of F defined as 74,7 = {fi € F'|7(a;) = 1}. Using the activation 
variables, we build the formula R, as follows: 


Ry = \ ay > fi (1) 
fie F 
Intuitively, if we set a; to 0 then the formula a; — fi is trivially satisfied, 
and if we set a; to 1 then f; has to be satisfied to satisfy a; — fi. Hence, the 
models of R; projected on A map to satisfiable subsets of F; formally: 


Proposition 2. For every valuation Tt of A, n E€ Mp, a iff Tap E€ Ri = SSP. 
Consequently, |Mp,\.a| = [Ral]. 


Let us note that the concept of activation variables (or alternatively relaz- 
ation variables) and the idea behind the formula R, is not novel and it appeared 
also in several MUS/MSS/MCS related studies such as [14,31,42]. However, we 
are the first who apply it in the context of MUS counting. 
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To build a formula W: that represents the wrapper Wı = SSp U MUSp, we 
will proceed similarly, i.e., we build W; using the activation variables A in such 
a way that a valuation m of A is a projected model of W, iff m4 7 E€ Wy. A 
straightforward approach to encode W is to directly express that we are inter- 
ested either in satisfiable subsets or MUSes of F. Such an encoding might look as 
R,(A) VisMUS(.A) where R;(A) is the formula from Eq. 1 encoding that 74,r is 
satisfiable and isMUS(.A) is a formula encoding that 74,7 is an MUS. However, 
encoding that a set S is an MUS is quite expensive since one has to express 
that all subsets of S are satisfiable and that S is unsatisfiable (Definition 1). 
Especially, encoding that a set S is unsatisfiable requires to assume all the expo- 
nentially many valuations of Vars( S). Several MUS related studies used various 
QBF encodings for the property of being an MUS, e.g., [13,31]. In particular, to 
express that a set S is an MUS, one can use the following, intuitively described, 
VJ-QBF encoding: ” for every valuation 7 of Vars(S) the valuation 7 models —S' 
(i.e., S is unsatisfiable) and for every subset S’ of S there exists a valuation 
T’ of Vars(S’) that satisfies S”. One could convert the V3-QBF encoding into 
a plain Boolean formula by explicitly enumerating all the possible valuations of 
Vars(.S) and all the subsets of S, however, this yields an exponentially large, 
and thus intractable, formula. Hence, instead of directly expressing that every 
element of the wrapper WV, is either a satisfiable subset or an MUS of F, we 
propose another approach based on a novel concept of an evidence. 


Definition 6 (evidence). Let A be a subset of F={fi,..., fn}. An evidence 
for A is a tuple (pi,..., Pn) such that for every 1 < i < n it holds that: 


1. pi: Vars(F) — {1,0} is truth assignment, and 
2. pi = A\ { fit. 


Crucially, we observe the following: 


Proposition 3. For every subset A of F it holds that A € SSp UMUSp = W, iff 
there exists an evidence for A. 


Our formula W, (Eq. 2) that encodes the wrapper W, captures every set 
A C F for which there exists an evidence (p1,..., $n). To represent the set 
A, we use the activation variables A = {a1,...,an}. To represent the truth 
assignments p1,..-., Pn, we introduce variable sets Z,,...,Z, where T; is a fresh 
copy of Vars(F’) for every i € {1,...,n}. 


W= far A GR fiva) (2) 
a;cA jE{1,.. n} {i} 
Intuitively, let 7’ be a valuation of Vars(W1) and T4 p = {fi € F |7 (ai) = 
1} the subset of F activated by A. For every activated clause f; € T'A, p, the 
formula expresses that mz, is a model of 7'4 p \ {fi} where the variable set 
Vars(F) is substituted by Z;. 


Proposition 4. For every valuation n of A, n E Mwa iff tar E Wi = 
SSpUMUSr. Consequently, |Mw a| = W1]. 
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Based on Propositions 2 and 4, we can now employ a projected model counter 
to obtain the model counts |My, a| and |Mr, al, which yields |W,| and |R1], 
and hence also |MUS | (Proposition 1). However, the concern here is the tractabil- 
ity of obtaining the model counts. There are mainly two criteria that affect the 
practical tractability of projected model counting. One criterion is the number 
of projected models, i.e. the cardinality of the wrapper (and the remainder), and 
the other criterion is the cardinality of the projection set, i.e., |A|. The wrapper 
W, is not very efficient w.r.t. these two criteria. Especially, YW, contains all sat- 
isfiable subsets of F, and there are often exponentially many satisfiable subsets 
of F w.r.t. |F|. Therefore, in the following, we will present nine additional wrap- 
pers, W2,...,W10, and their corresponding remainders. Each of the wrappers 
captures a property of MUSes that allows us to provide a better description of 
MUSes, and hence reduce the cardinality of the wrapper and/or the cardinality 
of the projection set. Similarly as in the case of W,, we will use the activation 
variables A to represent the elements of the wrappers/remainders. Moreover, 
every of the following wrappers W; will be encoded by a Boolean formula W; 
such that for every valuation 7 of A, 7 € Mw, a iff ma,r E€ W; (and similarly 
for the remainders). 


4.3 Wp - the Intersection of MUSes 


Our second wrapper WV2 is based on a simple observation: every MUS of F has to 
contain the intersection IMUS pf of all MUSes of F. Hence, we define the wrapper 
as Wz = {N € W, | N D IMUSp} and encode it via W2 as follows: 


Wo = Wi, A \ ay (3) 


fi CIMUS F 


Proposition 5. For every valuation 7 of A, n © My, \a iff TA, r E We. Con- 
sequently, |My, a| = |W]. 


The remainder Rə of W is by Definition 5 the set Wz N SSp. To build the 
formula Rg that encodes R2, observe that we already have an encoding for the 
set Wə (Eq. 3), and we also have an encoding for the set SS since SSp = Ry. 
Hence, we can build Rə as a conjunction of the two encodings: Rə = W2 A R1. 
Note that this construction of the remainder and the formula that encodes it is 
purely mechanical and does not involve any specific property of the particular 
wrapper. Therefore, for every wrapper W; and its encoding W; that are presented 
in the following sections, we define the reminder as R; = WiN Rı and encode it 
as R; = W; A R1. Proposition 6 witnesses the soundness of this construction: 


Proposition 6. For every valuation n of A, n E Mr,\a iff Tar E Ri. Conse- 
quently, |Mr,\.a| = |Ril- 


This section’s final question is how to compute the intersection IMUSp. It is 
well-known that a clause f € F belongs to IMUSp iff F \ {f} is satisfiable (see, 
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e.g., [32,40,56]). Hence, a straightforward way would be to perform such satisfia- 
bility check for each f € F, however, that might be very expensive. Fortunately, 
there has been recently proposed [13] a quite efficient algorithm to compute 
IMUSp which usually requires only few satisfiability checks, so we implemented 
that algorithm and use it while building the wrapper. 


4.4 Ws - The Union of MUSes 


Our next wrapper, W3, is very similar to the previous wrapper. Observe that 
every MUS of F is necessarily a subset of the union UMUSp of all MUSes of 
F. Consequently, also a weaker observation holds: every MUS of F is a subset 
of every over-approximation of UMUS;. We define the wrapper as W; = {N € 
W,|N C U} where U is either the exact union UMUS p or its over-approximation 
(U D UMUS;). Details on obtaining U are provided below. The encoding W3 of 
Ws; is analogical to Wo: 


W3 =WiA ri mli (4) 
fıgU 


Proposition 7. For every valuation t of A, 7 E Mwa iff TA, E Ws. Con- 
sequently, |My, a| = |W]. 


The computation of the union UMUS p has been examined in two recent stud- 
ies [13,45] that provided two different approaches for that task. Unfortunately, 
due to the problem’s hardness, both the studies showed that the proposed 
approaches can usually handle only relatively small input formulas. Namely, 
the approach from [13] requires O(|F|) calls of a XP oracle. Fortunately, it is 
often possible to cheaply compute a good over-approximation of UMUS p via the 
concepts of autark variables and a lean kernel. Briefly, a subset V of Vars(F) 
is an autark [46] of F iff there exists a valuation x of V such that for every 
clause f € F that contains a variable from V it holds that x = f. Since a union 
of two autark sets is also an autark set, there exists a unique maximum autark 
set [33,34]. The lean kernel K of F is the set of clauses that do not use any 
variable from the maximum autark set. It has been shown (e.g. [33,34]), that 
the lean kernel is an over-approximation of UMUS-. Hence, when building the 
wrapper W3, we use the lean kernel K as the over-approximation U of UMUSp, 
i.e., W3 = {N € W,|N C K}. There have been proposed several algorithms 
to compute the lean kernel, e.g. [36,43]; we have implemented the algorithm by 
Marques-Silva et al. [43] using a MaxSAT solver UWrMaxSat [54] as a back-end. 

Few words are in order to the effect of the two wrappers, Wz and W3, on the 
tractability of the projected model counting. Observe that in both cases (W2 and 
W3), we fix values of some variables from the projection set A. Hence, before 
passing the formulas to the projected model counter, we first propagate the fixed 
values of A to simplify the formulas. By doing so, we effectively reduce the size 
of the projection set A by |IMUSp| and |U| = |K], respectively. 
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Finally, let us note that the fact that an MUS has to be a subset of the 
union of all MUSes and a superset of the intersection of all MUSes is well-known 
and it has been already exploited in various ways in several MUS related stud- 
ies (see, e.g., [10,11,45]). Especially, the approximate MUS counting algorithm 
AMUSIC [13] utilizes UMUS in its preprocessing phase, and IMUSp to simplify 
3-QBF queries while searching for MUSes. 


4.5 Wa - Minimum MUS Cardinality 


Assume we can somehow compute the cardinality of a minimum MUS or at least 
its lower-bound minMUS. Knowing this number, we define our next wrapper as 
W, = {N € W, | |N| > minMUS}. To encode this wrapper via a formula W4, we 
employ a Boolean cardinality constraint atLeast(A,minMUS) expressing that at 
least minMUS variables from A are set to 1: 


W, = W1 A atLeast(A, minMUS) (5) 


Proposition 8. For every valuation n of A, n E Mw,ya iff tap E Wa. Con- 
sequently, |Mw, a] = Wal. 


There have been proposed several algorithms for computing an MUS with 
the minimum cardinality, e.g. [26,27,38]. However, since the task of comput- 
ing a minimum MUS is in FP*? [27,37], computing exactly a minimum MUS 
is too expensive for our scenario (empirically experienced). Instead, we pro- 
pose an approach for cheaply computing a lower-bound on the minimum MUS 
cardinality. 

Our method is based on a well-known relationship between MUSes and 
MCSes called minimal hitting set duality [32,56]. Given a collection C of sets, a 
set X is a hitting set of C iff CO X Æ f for every C € C. Furthermore, a hitting 
set X of C is minimal if none of its proper subsets is a hitting set. The duality 
relation states that a set N is an MUS of F iff N is a minimal hitting set of the 
set MCSp of all MCSes of F. Dually, a set M is an MCS of F iff M is a minimal 
hitting set of the set MUSr. Consequently, one can identify all the MCSes and 
then compute their minimum minimal hitting set to get an MUS with the mini- 
mum cardinality. However, there can be up to exponentially many MCSes of F, 
and thus their complete enumeration is often practically intractable. Our app- 
roach to obtain a lower-bound on the minimum MUS cardinality is the following. 
First, we employ a recent MCS enumeration algorithm RIME [11] to generate a 
subset M of MCSp. Subsequently, we compute a minimum minimal hitting set 
of M and use it as the lower-bound minMUS on the minimum MUS cardinality 
while building the wrapper W4. Note that since M C MCSp, it holds that every 
hitting set of MCSp is also a hitting set of M, and hence minMUS is indeed a 
sound lower-bound on the cardinality of a minimum hitting set of MCS p. 

Let us also briefly describe an algorithm for computing the minimum MUS 
by Ignatiev et al. [27], since it works on a similar principle as our approach. 
Their algorithm iteratively maintains a set kMCSes of known MCSes; initially 
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kMCes = Ø. In each iteration, the algorithm computes a minimum minimal 
hitting set X of kMCSes and checks X for satisfiability. If X is unsatisfiable, 
then it is guaranteed to be a minimum MUS. Otherwise, X is enlarged to an 
MSS using a single MSS extraction subroutine, the complement of the MSS (i.e., 
an MCS) is added to kMCSes, and the algorithm proceeds with a next iteration. 
Observe that one can also terminate their approach after a given time limit and 
use the last computed X as a lower-bound on the minimum MUS cardinality. The 
main difference between our and their approach is that we employ a dedicated 
MCS enumerator in the first step and then compute just a single minimum 
minimal hitting set, whereas they alternate single MCS extraction with minimum 
minimal hitting set computation. 


4.6 Ws - Maximum MUS Cardinality 


Assuming that we can somehow compute an upper-bound maxMUS on the max- 
imum cardinality of an MUS of F, we define our next wrapper as Ws = {N € 
W, | |N] < maxMUS}. Similarly as in the case of W,, to build the formula Ws that 
encodes Ws, we introduce a Boolean cardinality constraint atMost(A,maxMUS) 
expressing that at most maxMUS variables from A are set to 1: 


Ws = Wy A atMost(A, maxMUS) (6) 


Proposition 9. For every valuation n of A, n © Mw, a iff Tap E Ws. Con- 
sequently, | Mw, al = |Ws|. 


We are not aware of any prior work on computing the cardinality of the 
maximum MUS nor of a reasonable approach for computing at least its upper- 
bound. Hence, we propose a custom approach to compute such an upper-bound 
maxMUS. The base idea is to exploit our concept of wrappers: 


Proposition 10. Let W be a wrapper, i.e. W C MUSp U SSp, A the set of 
activation variables, and W a formula such that for every valuation t of A, t € 
Mw a iff map E W. Furthermore, let maxOnes = max({ones(7)|a E€ Mwja}) 
where ones(m) = |{a; E€ A|a(a;) = 1}|. Then maxOnes is an upper-bound on the 
maximum MUS cardinality. 


We use maxOnes as the value maxMUS while constructing wrapper W5. Any 
of the wrappers and its encoding presented in this paper can be used as W and 
W, respectively. To determine the value maxOnes, we define a partial MaxSAT 
problem using the formula W A Aa;c4 4i, where W are the hard clauses and 
Na; ca 4i are the soft clauses. To solve the problem, we employ the MaxSAT 
solver UWrMaxSat [54]. 


4.7 We - Component Partitioning 


It is often the case that the clauses of F can be partitioned into several compo- 
nents, i.e. disjoint subsets of clauses, such that every MUS of F consists only of 
clauses from a single component. In particular: 
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Definition 7 (components). Given a clause fi E€ F, the component C( fi) of 
fi is the minimal subset of F satisfying: 


1. fi € C(f), and 
2. for every l € fi and every fj E€ F with =l € fj, C(f;) =C(f;). 


Example 2. Assume that F={{x1}, {721}, {£2}, {7271, 772}, {£3}, {773}, {xa}, 
{x4,5}}. There are four components: Cı = {{21}, {701}, {£2}, {441,772} }, 
Cy = {{as}, {>a3}}, Cs = {{aa}}, and Cy = {{x4,x5}}. Cı has two MUSes: 
{{a1}, {7a1}} and {{ax1}, {£2}, {721, -x2}}, C2 has one MUS: {{ax3}, {7a3}}, 
and C3 and C4 have no MUSes. 


Proposition 11. Let N be an MUS. Then for every two clauses fi, fj E N, it 
holds that C( fi) = C(f;). 


The wrapper We captures the partition of MUSes into components, and it is 
defined as We = {N € Wi |Vi,,¢;en-C(fi) = C(f;)} and encoded via We: 


We = Wi A VAN (ai > VAN maj) (7) 


aic A FiEF\C(f:) 


Proposition 12. For every valuation n of A, n E Mwe a iff ma, r E We. Con- 
sequently, |Mvy, al = [Wel- 


To partition the input formula F into individual components, we construct 
an undirected graph whose vertices are the clauses of F and every two vertices, 
fi and fj, are connected via an edge iff there exists | € f; such that =l € fj. The 
components of F then correspond to connected components of the graph (which 
can be identified in linear time w.r.t. the size of F by traversing the graph). Note 
that a similar flip graph has been used in a study [68] on model rotation and its 
usage during single MUS extraction. 


4.8 Wr - Minimal Hitting Set Duality 


We again exploit the minimal hitting set duality between MUSes and MCSes 
(Sect. 4.5). Recall that if a set M is an MCS of F then M N N £ QO for every 
N € MUSp. We define the wrapper Wy as {N € Wi |VuemM ON £ Ø} where 
M is a set of MCSes. To obtain M, we run an MCS enumeration algorithm 
RIME [11] constrained by a user-defined time limit. The encoding W7 of Wz is: 


W,=WiA \ VV ay (8) 
MEM f;EM 


Proposition 13. For every valuation m of A, 7 E Mw, .a iff tap E Wr. Con- 
sequently, |My, || = |W7]. 
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4.9 Ws - Literal Negation Cover 
Our next wrapper captures the following observation about MUSes. 


Proposition 14. Let N be an MUS of F, fi E€ N a clause of N, andl € fia 
literal of fi. Then there exists a clause f; E€ N such that al € fj. 


Based on the above proposition, we define the wrapper Wg as Wg = {N € 
Wi | Yren.: Viesi Ifen- lE fj}, and encode it as follows: 


Ws = W1 A^ NaNe V aj)) (9) 


aicA lefi f;€{f;EF | -lef;} 


Proposition 15. For every valuation n of A, 7 E€ Mwsja iff tAr E Ws. Con- 
sequently, |My, | = |Wsl- 


4.10 Wg - Non-extendable Evidence Models 


Assume that N is an MUS and (1,...,(n) is its evidence. By Definition 6, it 
holds that pi H= N \ {fi} for every 1 <i < n. Observe that since N is unsatisfi- 
able, then it is also necessarily the case that p; = — f; for every 1 < i < n. Hence, 
we define our next wrapper, Wo, as Wy = {N € W, | 5pi,.--5 Pn: Vi<i<n: Pi F 
N \{fi}and p; H| ~fi}. Note that the above-stated property applies universally 
to every evidence of an MUS, and yet we require in the definition of the wrap- 
per only an existence of one such evidence. The reason is that there can be 
up to exponentially many evidences for an MUS w.r.t. | Vars(F’)| and hence it is 
intractable to reason about all of them in the Boolean encoding of the wrapper. 


Wo = W1 ^ \ ai > “fil Vars(F)/T;] (10) 
aE A 


Proposition 16. For every valuation n of A, 7 E Mw, a iff tA, r E Wo. Con- 
sequently, |Mw al = |Wol. 


4.11 Wo - Enforced Evidence Models 


Our final wrapper, Wj 9, again builds on the variable valuations p1,..., On that 
form an evidence of an MUS N of F. In the previous wrapper, Wo, we have 
exploited that none of the variable valuations can be a model of N. Here, we 
express that none of the valuations can be easily modified to be a model of N. 
In particular, if fi € N, then by the definition of an evidence, p; = N \ {fi}. 
Assume that we pick a literal l € f; and turn p; into a valuation p; by flipping 
the assignment to l so that p; = fi. Since N is an MUS (i.e., unsatisfiable), there 
necessarily exists a clause fj € N such that p; j fj, i.e., fj forces pi to satisfy 
~l and hence prevents from flipping p; to a model p; of the whole N. Formally: 
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Proposition 17. Let N be an MUS, fi € N a clause of N, and pi a model of 
N \ {fi}. Then for every literal l € fi, there exists a clause fj E€ N such that 


ale fj and pi E fj \ {7 


Similarly as in the case of Wo, observe that Proposition 17 applies univer- 
sally to every evidence of an MUS, however, since there can be exponentially 
many such evidences, it is expensive to reason about all of them. Hence, in 
the wrapper, we capture just an existence of such an evidence: Wig = {N € 
Wi | dpi,--- Pn: Vi<i<n: pi EN \ {fi} and if fi € N then Viesi dy;en- ale Íi 
and p; Æ f; \ {~1}}. Equation 11 shows the corresponding encoding via W10: 


Wo=Wir \ ar N€ V aj No(s \ CIP [vars(F)/Zi1) (11) 


aCA lEfi fjE{fjEF |E fi} 


Proposition 18. For every valuation t of A, 7 E Mw, la iff Tta, E Wio. 
Consequently, |Mwoal = |Wiol- 


4.12 Combining Wrappers and Their Remainders 


In the previous sections, we have presented multiple wrappers, each of which 
captures a different property of MUSes. In this section, we show that the indi- 
vidual wrappers can be easily combined and, hence, form wrappers that provide 
a more accurate description of the set MUS p. 


Proposition 19. Let A be the set of activation variables, W* and W! wrappers, 
and RE and R! the remainders of W® and W!. Furthermore, for every m € 
{k, 1}, let W™ and R™ be formulas such that: 


— for every valuation t of A, n E Mym a, iff na, r E W™, and 
- for every valuation t of A, 7 E Mami, iff tar ER”. 


Then all the following hold: 


1. WE AW! is a wrapper and RE OR? is its reminder. 

2. For every valuation t of A, n © Miwenwy a tf Tar E WE AWI. Conse- 
quently, |M ye awty.al = W n w']. 

3. For every valuation n of A, t E Migrar a iff Tar E REAOR!. Consequently, 
|Mrenral] = [REA R'. 


Note that although Proposition 19 discusses only a combination of two wrap- 
pers, it can be applied repeatedly on already combined wrappers. Hence, we can 
combine any subset of the wrappers W1, ...,Wıo we proposed. Also, note that 
all the formulas W2, ..., Wio subsume the formula W1, and hence if we com- 
bine multiple wrappers, we duplicate some clauses. In our implementation, we 
first remove all the duplicates and apply other straightforward model preserving 
simplifications before we pass the encoding to a projected model counter. 
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5 Experimental Evaluation 


We have implemented our approach for counting MUSes in a python-based tool’, 
using the projected model counter GANAK [59] to count the models of wrappers 
and remainders, and also using several auxiliary tools as described above. 

We presented 10 base wrappers Wj ,...,W19 and shown how to combine 
them. Since W, is subsumed by all the wrappers We,..., W410, there are 2° 
combined wrappers. Due to the large number of the combinations, we were able 
to evaluate only some of them. In particular, we evaluated the combination 
WN: + -AW io, denoted as Wail, of all wrappers since it provides the most precise 
description of MUSes. We also evaluated 6 wrappers that emerge from Wall by 
excluding individual base wrappers or combinations of similar base wrappers, 
and also the most basic wrapper W,. The table below shows the names and 
definitions of the evaluated combinations: 


name definition name definition 
Wil Wi Wno6 ie (2,3,4,5,7,8,9,10} Wi 
Wno23 Nie {4,5,6,7,8,9,10} Wi Wn 07 Mlic{2.3,4,5,6.8,9,10} Wi 
Wno4 [lic{2,3,5,6,7,8,9,10} Wi Wn08910 Nier2,3,4,5,6,7} Wi 
Wnod N)ie¢2,3,4,6,7,8,9,10} Wi Wall Niese,...,10} Wi 


We also evaluated two contemporary MUS enumerators, MARCO? [39] and 
UNIMUS? [10]. Moreover, we evaluated the approximate MUS counter AMU- 
SIC* [13] using its default guarantees, i.e., the provided MUS count estimates 
are within 1.8 multiplicative factor of the true count with 80% confidence. 

Our benchmark suite consists of the 2553 instances previously employed in 
the prior MUS and MSS literature, including those released by authors of AMU- 
SIC [13]. The formulas contain from 78 to 1000 clauses and from 40 to 996 
variables. The MUS count varies from 1 to 1.7 x 10° MUSes. 

We focus on three comparison criteria: 1) the number of benchmarks solved 
by the evaluated tools (ie. benchmarks where the tools provided the MUS 
count), 2) the scalability of the tools w.r.t. the number of MUSes in the bench- 
marks, and 3) we examine the accuracy of our wrappers. 

All experiments were run using a time limit of 3600s per benchmark on a 
Linux machine with AMD 16-Core Processor and 20GB memory limit. When 
using wrappers W, and W7, we used a combined limit of 300s (included in 
the 3600s) and 100000 MCSes for the MCS enumeration while building the 
wrappers; if both wrappers were used, we run the MCS enumeration just once. 
Finally, while constructing a combined wrapper of the form W, N W5, we used 
W,. to compute the value maxMUS for creating Ws. 


1 https://github.com/jar-ben/exact MUSCounter. 
? https: //sun.iwu.edu/~mliffito/marco/. 

3 https: //github.com/jar-ben/unimus. 

* https: //github.com/jar- ben /amusic. 
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Table 1. Number of solved benchmarks by individual tools. 


Our Wrapper-Remainder Based Tools 
AMUSIC | UNIMUS | MARCO | W1 | Wno23 | Wno4 | Wno6 | Wno7 | Wno8910 | Wall | Wno5 


623 833 799 403 | 1475 1498 | 1486 | 1445 |1058 1486 | 1500 
AMUSIC + MARCO x Wno5 e Wal v 
UNIMUS x wi Wno8910 vy 


time in seconds 
N 
a 
(=) 
(m 


0 200 400 600 800 1000 1200 1400 1600 
number of solved benchmarks 


Fig. 2. The number of solved benchmarks in time. 


5.1 Solved Benchmarks 


In Table 1, we show the number of benchmarks that were solved by the individ- 
ual evaluated tools. The worst performance was achieved by the basic wrapper 
W1 (W1), which is not surprising since it does not provide a good descrip- 
tion of MUSes. AMUSIC solved 623 benchmarks, whereas UNIMUS and MARCO 
solved 833 and 799 benchmarks, respectively. Except for Wno8910 (and W1), 
which solved only 1058 benchmarks, all the remaining combined wrappers solved 
around 1450-1500 benchmarks and hence significantly dominated both AMUSIC 
and the two MUS enumerators. Maybe surprisingly, Wall that combines all the 
base wrappers ended up at the third position; the highest number (1500) of 
solved benchmarks was achieved by Wno5, and the second-highest (1498) by 
Wno4. Note that Wno5 and Wno4 exclude encoding of the minimum and max- 
imum MUS cardinality via Boolean cardinality constraints. In general, solving 
Boolean cardinality constraints is often quite hard, and hence even though a 
presence of the two wrappers might provide a better description of MUSes, the 
constraints increase the hardness of the generated instances. 

Figure 2 compares the time needed to solve the benchmarks by a subset (for 
a better clarity) of the evaluated tools. A point with coordinates [x,y] means 
that x benchmarks were solved (by the corresponding tool) within the first y 
seconds. 


5.2 Scalability W.r.t the MUS Count 


In Fig. 3, we compare the scalability of the evaluated tools w.r.t. the number of 
MUSes in the benchmarks. In particular, a point with coordinates |x, y] denotes 
that the corresponding tool solved y benchmarks that contained at most x 
MUSes. For a better clarity, we compare only our best wrapper, Wnod, with 
AMUSIC, MARCO, and UNIMUS. Note that whereas AMUSIC scales to instances 
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AMUSIC + UNIMUS x MARCO x Wno5 e 
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Fig. 3. The number of solved w.r.t. the MUS count. 


with 10° MUSes, the remaining three tools scale only to instances with at most 
a million of MUSes. In fact, note that even though AMUSIC solved in overall just 
623 benchmarks, there are 319 benchmarks that were solved only by AMUSIC. 
Based on a closer examination of the results, we identified that AMUSIC scales 
much better than the other tools w.r.t. the MUS count, however, it does not 
scale so well w.r.t. the number of clauses in the input formula F. This is not 
surprising since AMUSIC is just an approximate counter and as such, it needs 
to explicitly identify only logarithmically many MUSes w.r.t. |F| even though 
there can be up to O(2!"!) many MUSes. On the other hand, AMUSIC relies on 
repeated calls to a 3-QBF solver whose efficiency highly depends on |F]. 


5.3 Accuracy of Wrappers 


Recall that a wrapper W over-approximates the set MUS p of all MUSes of F, i.e., 
W D MUS p (Definition 5), and hence we are interested in measuring the accuracy 
of the over-approximations. In particular, given a wrapper W and its remainder 
R constructed over a formula F, we measure the ratio ie The range of the 
ratio is [0, 1); the closer to 0 the more accurate the wrapper is, and especially 
when ns = 0, the wrapper exactly captures the set MUS p (i.e., W = MUS p). 


We illustrate the ratio Ka achieved by individual wrappers in Fig. 4. A point 


with coordinates [x,y] expresses that for x percent of benchmarks completed by 
the corresponding tool, the ratio = was at most y. As expected, the ratio 
achieved by the most basic wrapper W1 (W1) is very high for all the bench- 
marks, i.e., the wrapper captures MUS very inaccurately. On the other hand, 
the other wrappers achieved for a vast majority of benchmarks a very low ratio, 
i.e., they over-approximate MUSp very tightly. In fact, for 87% of benchmarks, 
the wrappers Wno23, Wno4, Wnod, Wno6, and Wall, achieved the ratio 0, i.e., 
the wrappers exactly captured the set MUS. In contrast, the wrappers Wno7 
and Wno8910 achieved the ratio 0 for only 68 and 80% of benchmarks, which 
suggest that the use of the corresponding wrappers, W7, Ws, Wo, and Wo, is 
vital for an accurate description of MUS. Moreover, note that the accuracy of 
the wrappers highly correlate with the number of solved benchmarks (Table 1), 
since Wno7 and Wno8910 (and W1) were the least efficient wrappers. 
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Fig. 4. The ratio = expressing the inaccuracy of wrappers. 


6 Future Possible Applications of Wrappers 
and Remainders 


Recall that a wrapper W over-approximates the set MUS p of all MUSes of F, i.e., 
W D MUSp (Definition 5). Moreover, in Sect.5, we empirically witnessed that 
the best of our wrappers usually over-approximate MUSp very tightly or they 
even capture it exactly. Consequently, the propositional encodings W and R of 
a wrapper W and its remainder R, respectively, can very precisely capture the 
set MUS. We strongly believe that such an accurate propositional description of 
MUS paves the way (and will be thoroughly examined in our future work) to 
efficiently solve many other MUS related problems including, e.g., the following: 


Approximate MUS Counting. Recall that |MUSp| = |W] — |R|. Assuming 
that |R| is much smaller than |W| and observing that R C W, computing 
Mpral = |R| should be much faster than computing |My a| = |W]. Hence, one 
could first relatively quickly exactly compute the value |Mp,4|, and then use an 
approximate model counter to find an estimate w’ of |My ,4|. The MUS count 
MUS | can be then approximated as w’ — |R|. The accuracy of the approxima- 
tion depends on the approximation guarantees of the model counter (e.g. using 
ApproxMC4 [18,60], we get the (e, 6)-guarantees provided by AMUSIC). 


MUS Enumeration. Assume a valuation m of the activation variables A and 
the corresponding activated subset t4, r = {fi € F'| t(a;) = 1} of F. As shown in 
Sect. 4, 74,7 is an MUS iff n € Mw y and m ¢ Mga. Hence, one can enumerate 
MUSes by enumerating projected models of W and discarding those that are 
also projected models of R. 


MUS Sampling. To sample an MUS of F, one can iteratively sample an element 
m of Mwy until it identifies m such that 7 ¢ Mryy, i.e., 74,r is an MUS. Note 
that while the past decade has witnessed significant progress in the development 
of projected model sampling approaches [16,22,55] (with various distribution 
guarantees), we are not aware of any existing MUS sampling technique (with 
reasonable distribution guarantees). 


Minimum and Maximum MUS Cardinality. As discussed in Sect. 4.6 (W5), 
one can over-approximate the maximum MUS cardinality by finding a model 
t E Mw,a that maximizes the number of variables assigned 1. Similarly, one can 
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under-approximate the minimum MUS cardinality by finding a model 7 € My 4 
that minimizes the number of variables assigned 1. Intuitively, the smaller |R| 
is, the more precise approximations can be expected. Moreover, by checking if 
m E€ Mpy.a, one can actually verify if 74,7 is an MUS. 


MUS Membership. The MUS membership problem is to decide if a clause 
fi € F belongs to an MUS of F and it is known to be Xp -complete [31,35,37]. 
Contemporary techniques for deciding the problem are mainly based on solving 
2-QBF or 3-QBF encodings [13,31]. Our wrapper-based framework allows for an 
alternative approach: to decide if a clause f; belongs to an MUS of F, one can 
check if there exists a valuation 7 of A such that m(a;) = 1, 7 E€ Mwy, and 
m Z Mpa. Note that when |R| = 0 or when |R| can be bounded by a constant, 
this check boils down to a single call of a SAT solver. 


7 Conclusion and Future Work 


In this paper, we focused on the problem of MUS counting and proposed the first 
exact MUS counter, called CountMUST, that does not rely on explicit MUS enu- 
meration. The base idea is to reduce the problem of MUS counting to (two queries 
of) projected model counting via the framework of wrappers and remainders. The 
availability of scalable projected model counter, GANAK, allowed CountMUST 
to scale much better and solve significantly more instances than other exist- 
ing approaches. Moreover, as discussed in Sect.6, the tightness of wrappers and 
remainders opens up new potential applications ranging from approximating 
counting, enumeration, membership, and the like. 

We also revisit the complementary nature of CountMUST and AMUSIC with 
respect to the size of instances and the MUS count. The complementary perfor- 
mance opens up opportunities for a portfolio approach that can achieve the best 
of both of the worlds. Finally, let us note that we are fighting here the chicken 
and egg nature of the existence of practical applications and scalable algorith- 
mic techniques for problems in automated reasoning. Often the lack of scalable 
techniques leads to a lack of incentives for end-users to design reductions to 
practical applications, and vice versa. Even though MUS counting has already 
many applications in the diagnosis domain [25,29,48-50,65], we hope that the 
availability of CountMUST will break this chicken and egg loop in other areas 
and enable further investigations into MUS counting applications. 
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Abstract. First-Order Linear Temporal Logic (FOLTL) is particularly 
convenient to specify distributed systems, in particular because of the 
unbounded aspect of their state space. We have recently exhibited novel 
decidable fragments of FOLTL which pave the way for tractable verifi- 
cation. However, these fragments are not expressive enough for realistic 
specifications. In this paper, we propose three transformations to trans- 
late a typical FOLTL specification into two of its decidable fragments. 
All three transformations are proved sound (the associated propositions 
are proved in Coq) and have a high degree of automation. To put these 
techniques into practice, we propose a specification language relying on 
FOLTL, as well as a prototype which performs the verification, relying 
on existing model checkers. This approach allows us to successfully ver- 
ify safety and liveness properties for various specifications of distributed 
systems from the literature. 


1 Introduction 


Verifying properties of distributed protocols is a demanding endeavor. Several 
approaches have been proposed, ranging from verification frameworks, like Iron- 
Fleet [12] or Verdi [27] to tool-supported languages like TLA* [17], Event-B [1] 
or Ivy [20,21]. However, when systems of arbitrary size are considered, verifying 
properties usually requires some remarkable effort: inductive invariants must be 
sought and exhibited (possibly with tool support), and some manual proof effort 
may still be necessary. Worse, when liveness properties are checked, this effort 
becomes very substantial and tool support is still quite limited. 

A natural setting for specification, in particular for safety and liveness prop- 
erties of infinite-state systems, is (mono- and many-sorted) first-order linear 
temporal logic (FOLTL). However, it is highly undecidable [13,14]. In recent 
work [23,24], some of the present authors devised the “Geneva” fragments of 
FOLTL, which were shown to be decidable. More precisely, these fragments 
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enjoy a “bounded domain property” (BDP), a form of computable finite model 
property over the first-order domains. Decidability is obtained by expanding 
first-order quantifiers over the domains (using the computed bounds) and then 
relying on (decidable) propositional-LTL satisfiability checking. 

The Geneva fragments are rather expressive but still have limitations that 
thwart their use for the specification of systems. In particular, most forms of 
fairness assumptions, as well as frame conditions (which specify what does not 
change when a transition happens in a system), do not fit in the fragments. 
Furthermore, topological properties of systems (such as ring topologies) are hard 
or even impossible to specify. 

In this article, we mitigate this deficiency by exhibiting three transformations 
that allow to map an undecidable, expressive fragment of FOLTL*® (FOLTL with 
equality and reflexive-transitive closure, to characterize topological properties) 
into decidable fragments (akin to the Geneva ones), thus allowing the automatic 
verification of safety and liveness properties of infinite-state systems. Then we 
apply these techniques to the verification of properties of various protocols. 

Notice that none of the proposed transformations is complete. It is actually 
impossible to devise complete transformations, even assuming a procedure that 
would be fed additional user input. This is because FOLTL is not even semi- 
decidable.! 

In more detail, we make the following contributions (cf. Fig. 1): 


— we define an undecidable, expressive specification language, called Cervino, 
the semantics of which is expressed in terms of FOLTL®; 

— we exhibit two fragments of many-sorted FOLTL that enjoy the BDP; 

— we devise three abstraction transformations that map (the semantics of) 
Cervino into one of the said two fragments: 

e the first of these transformations (called TEA) is fully automatic while 
the other two (TTC and TFC) must be passed additional data (in the 
shape of peculiar formulas); 

e these three transformations, as well as other minor ones, are implemented 
as tactics in a prototype tool [22]; 

e the associated theorems and lemmas are also formalized and proved cor- 
rect, using Coq [22]; 

— we demonstrate our approach on several case studies that are often used as 
benchmarks in the literature. 


This article is organized as follows: in Sect. 2, we illustrate our approach using 
an example (a leader election protocol). Section 3 introduces definitions as well 
as the two fragments used in the rest of the paper. In Sect. 4, we present basic 
techniques, which are used in some of our transformations. Then, in Sect. 5, we 
formalize the automatic TEA transformation. Section6 and 7 present, respec- 
tively, the TFC and TTC transformations. In Sect.8, we evaluate our approach 
on various protocols. Finally, we compare our results with related work in Sect. 9. 


Indeed, having such a transformation would give a procedure for semi-decidability 
by testing all possible inputs on this transformation. 
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Cervino spec. _ Semantics _ s FOLTL* 


& property : ® associated propositions 


undecidable: Pea ae 


correct 


inconclusive 


Fig. 1. Summary of the contributions of this article 


2 The Cervino Language 


In this section, we present the Cervino modeling language informally. Its seman- 
tics, given in terms of many-sorted FOLTL= (FOLTL with equality and reflexive- 
transitive closure), is formally introduced in Sect. 3.3. This language is suitable 
for specifying infinite-state systems. It is undecidable but we enforce some syntac- 
tic constraints anyway, in order to ease the further application of transformations 
mapping into decidable fragments of logic. 

Cervino is illustrated in Fig. 2 using the example of a leader election proto- 
col [6] in a ring of unbounded size. Nodes sit in a directed ring and each node 
has a unique ID. There is a total order on IDs. The goal of the protocol is to 
elect a leader (in practice, the one with the greatest ID). A node can send to its 
successor in the ring the IDs it knows about, the receiver keeping those that are 
greater than its own ID. A node is elected if it receives its own ID. 


2.1 Sorts, Relations and Axioms 


A Cervino specification may define sorts, (first-order) sorted relations and sorted 
constants. An interpretation structure for such a specification is a set of infinite 
traces of states. Classically, a state maps a sort to a non-empty set, a constant 
to an element of such a set and a relation to a set of tuples, all respecting the 
obvious sorting and arity constraints. The interpretation of sets and constants 
is rigid while that of relations is flexible. 

In the example, nodes and their IDs are conflated into a single sort Node; and: 
an elected relation represents the set of elected nodes; a succ relation represents 
if two nodes are successive in the ring topology; a toSend relation represents the 
mailbox for each node; an lte relation defines a total ordering on nodes; an Imax 
constant represents the highest maximal identifier among nodes. 

States can be constrained by axioms, i.e. sets of formulas. The latter belong to 
FOLTL*, that is they can mix first-order logic (with equality) with the “always” 
(G), “eventually” (F) and “next” (written as a prime symbol and only applied to 
atoms), as well as a reflexive-transitive closure connective (written *). However, 
we enforce a syntactic constraint on axioms: after converting them to negation 
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sort Node nodes are conflated 
relation succ in Node * Node 
using btw btw/succ] is enab 
relation lte in Node + * Node 
relation toSend in Node * Node 
relation elected in Node t of 
constant Imax in Node node Laa identifier 
axiom connected { G (V x, y: Node - KOGE y))} 
axiom order { G { V id: Node - lte(id, B 
.}3} classic total lerir 1g aALLOMS 
axiom is_ elected { G Ww x: Node . elected’ (x k <> ea ) v toSend’(x, x))) } 
axiom init { // in the tia 
dy: Node - a y 1 
x, id: Node - !toSend(x, id) 
Vx: Node - !elected(x) } 
event send [src: Node] 
modifies toSend at { (dst,id) - (toSend(src,id) v id = src) A succ(sre,dst) }, 
elected { 
V dst: Node, id: Node - (succ(src,dst) A (toSend(src,id) v id = src)) = 
(toSend’(dst, id) = 
(toSend(dst, id) v (Ite(dst, id) a (id = src v toSend(src, id))))) } 
check Safety { G (Y x: Node - elected(x) = x = lmax ) } 
using TFC . hidden parameters (see Sect. 6) 
check Liveness { F (dy: Node - elected(y)) } 
assuming { 
Y src: Node. GF { 
V dst: Node, id: Node - (succ(src,dst) ^ (toSend(src,id) v id = src)) = 
(toSend’(dst, id) = 
(toSend(dst, id) v (Ite(dst, id) a (id = sre v toSend(src, id))))) } } 
using TTC ... hidden parameters (see Sect. 7) 


Fig. 2. Specification of the leader election protocol (prettified syntax) 


normal form (NNF), an existential quantifier cannot appear in the scope of a 
universal quantifier or of a G connective (no V...4...,no G...4...). 

A binary relation r can by “tagged” (written using btw) to force r to be a 
function? and enable a special ternary relation btw{r]. Then, btw[r](x,y,z) means 
that there is an acyclic path between x and z passing through y. The semantics 
of btw[{r] is given through axioms (see Definition 14) and is related to r* through 
the following equivalence: r*(x, y) = btw[r](x,y, y). 


2.2 Events 


Events specify how the system may evolve from one state to another. Events 
(more precisely: event schemas) are declared with a name and a list of argu- 
ments that are the only variables that can appear free in the body of the event. 


2 Yx,y,z:s- r(x,y) Ar(x,z) Sy =z. 
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The declaration of an event also features a modifies section describing which 
tuples of which relations may be modified by the event. Other relations or parts 
of relations are necessarily left unchanged. The body of an event is specified 
in primed FO (FO augmented with primed relation symbols representing the 
value of these relations in the next state) with the additional constraint that no 
existential quantifier may appear positively in the body. 

The semantics for events is standard and comparable to the one used in 
TLA* or Electrum: in every state, at least one event is fired. In other words, 
there is a valuation for arguments of at least one event such that the body of the 
said event evaluates to true. More formally (and ignoring sorting constraints for 
the sake of readability), given event bodies ¢1,...,¢@, and arguments y1,..., Ym, 
appearing as free variables in ¢;, the semantics of event is given by the formula: 

n 


G( V 3y1,..-Ym; © i). We insist that this formula is only implicit: it cannot 
i=1 


be input by the specifier as it is the purpose of transformations to massage it. 
Finally, if needed, fairness constraints must be added by the specifier . 

In the example, the send event represents the fact that a node updates 
its successor’s mailbox by adding all IDs that are larger than the successor’s 
ID. This way, the largest ID is passed along the ring. Notice we use univer- 
sal quantification: we could have defined dst and id as parameters of send, but 
the implicit existential quantification, although theoretically acceptable, can be 
costly performance-wise (as succ is a function, this is significant for the id argu- 
ment only). We also specify that the event modifies the toSend relation for 
specific pairs of a node and an identifier, only if these satisfy a condition saying 
that the ID is in the sender’s mailbox (or corresponds to the sender’s ID) and if 
the node is the sender’s successor (the body of the event says what happens in 
that case). 


2.3 Commands 


A check declares a command to verify whether a property holds. To do so, a 
command uses a certain tactic (TEA, TFC, TTC), as well as additional parameters 
in the case of TFC and TTC (these are presented in Sect.5 and 6, respectively). 
The purpose of this article is precisely to present these transformations. We 
notice that a command may also be associated with additional, specific axioms 
in an assuming section (in the example, this section contains a fairness property, 
necessary to prove the liveness property). 


3 Background on FOLTL 


3.1 Syntax and Semantics of FOLTL 


The basic vocabulary of MSFOLTL (that we simply call FOLTL in the following) 
is defined out of a signature X = (S, Const, R) where S is a set of sorts, Const 
is the set of (sorted) constant symbols and R = (Rz)zes+) is a family of sets of 
relation symbols, with Ry the set of relation symbols over tuples of sort 8. 
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Definition 1 (Formulas). Given a signature X = (S, Const, R) and a set of 
variables V, FOLTL- formulas over X and V are defined inductively by the 
following grammar: 


w: = r(ti,...,tn) |ti = t2 | >Y | Y v Y| Xy | Fu | Ve:s-vl|aris-y 


where x E€ Vs, r E€ Rsi..s, and ti E Vs, U Const,, for each i, with Vs (resp. 
Consts) the set of variables (resp. constants) of sort s. 


X and F stand for the “next” and “eventually” connectives. Usually FOLTL 
includes the U connectives, however it is not required in this paper. We also 
define “always” as Gy = —F(—w). Similarly, classical propositional connectives 
A, = and © are defined in the natural way. Additionally: 


— We write ~[2] for a formula having x as a free variable. 

— We write FV(¢) for the set of free variables of a formula, defined in the 
obvious way. A formula ¢ is said to be closed if FV(¢) = Ø. 

— Classically, a formula is in negation normal form (NNF) if negations only 
appear in front of relation symbols. 

- If C is a subset of {X,F,G} then we denote by FOLTL-(C) (resp. 
FOLTL(C)) the set of FOLTL- formulas (resp. FOLTL formulas without 
equality) in NNF containing only temporal operators from C. 

— A formula I is called literal if l = r(t1,...,tn) or | = >r(t,...,tn) where 
reEV,reé Ry and t; E Const L V for each i. 


We now introduce the semantics of FOLTL_. In the interpretation struc- 
tures defined below, the interpretation of relations varies over time while that 
of function symbols does not. 


Definition 2 (Interpretation Structure). Given a signature X = 
(S, Const, R), an (interpretation) structure M (over X) is a triple 
((Ds)ses, o, p) where: 


- D = (Ds)ses is a family of pairwise-disjoint nonempty sets and each Ds is 
the domain of the sort s. 

- o maps each constant ce Const, to an domain element o(c) € Ds. 

- p maps any pair (i,r) e Nx Rs,...s, of instant and relation to the set p(i,r) C 
Ds, x... X Ds, of tuples satisfying r at instant i. 


Definition 3 (Assignment). An assignment C in domains (Ds)ses for vari- 
ables in V is a map V —> D. We write C[x + d] the assignment defined as 
Cla d\(x) = d and Cia — d|(y) = C(y) ify # x. The extension of C to terms, 
also written C, is defined in the obvious way. 


Definition 4 (Satisfaction). Given a structure M = (D,o,p) and an assign- 
ment C, the satisfaction relation F is defined by induction on formulas, for 
any i € N, as follows: 


= M,i,C E ti = tə iff C(t1) = C(t2); 
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—~ M,i,C E r(ti,...,tn) if (C(h),...,C(tn)) E€ pir); 
- M,i,C E -¢ iff M,1,C É 6; 

- M,i,C E ġı v b2 iff M,i,C E ġı or M,i,C E do; 
- M,i,C FE Xo iff M,i+1,CE ¢; 
- M,i,C E Fé iff there exists ke N s.t. M,i+k,CF ¢; 

- M,i,C Fjy:s-¢ iff there exists de Ds s.t. M,i,Cly > d| F ¢; 
- M,i,CEVa:s-¢ iff for every de Ds, we have M,i,Cla > d| F ¢. 


Given a closed formula ¢, we write M, k E ¢ if M,k, [|] E $, where || is the empty 
assignment. Then Mod(¢) denotes the set of structures M such that M,0F ¢. 


T 


Definition 5 (Reflexive-Transitive Closure). We write FOLTL* for the 
enrichment of FOLTL- with a reflexive-transitive closure connective. Then for 
any sort s € S and any binary relation symbol r € Rs s, the language of FOLTLŽ 
is augmented with a fresh binary relation symbol : r* E€ Rs s, and we have: 


M,i,C E r*(tı, t2) iff M,i,C E tı = te or there exists n e N s.t. M,i,C E 
Jro, ..., En t1 = To A^ t2 = Tn ^ ( A T(Ti, Zi+1))- 
O0<i<n-1 
Let ¢,¢' be two FOLTL® formulas. If for any structure M and any assign- 
ment C, we have M,0,C E ¢ iff M,0,C E ¢’ then we say that ¢ and @’ are 
logically equivalent, written ¢ = ¢’. 


3.2 Bounded Domain Property 


In this section we introduce the Bounded Domain Property (BDP) and present 
two fragments of FOLTL that enjoy the BDP. These fragments play an important 
role in the verification procedures presented in this article. 


Definition 6 (Bounded Domain Property). A fragment Frag of FOLTL 
enjoys the bounded domain property (BDP) if given @ € Frag, is not satis- 
fiable, or there is a domain-finite structure M s.t. M,0 E ġ whose the domain 
size is computable from ġ. Additionally, BDP implies decidability. 


We now present the two fragments that are used in this paper. Both fragments 
are included in a larger fragment for which the BDP is established in [24]. 


Definition 7 (LTR fragment). A formula ¢ of FOLTL- is said to belong to 
the (multisorted) Linear-Temporal Reasoning (LTR) fragment if ọ is in NNF and 
existential quantifiers only appear in the head of ¢. 


Theorem 1 ((16,24]). Any formula ¢ € LTR (even with equality) enjoys the 
BDP. The bound of verification for each sort is the sum of the numbers of exis- 
tential quantifiers and constant symbols over this sort. 


3 It is possible to fully axiomatize the transitive closure in pure FOLTL, however since 
it does not fit into the scope of this paper such an axiomatization is not presented 
here and we simply extends FOLTL with the classical definition of transitive closure. 


344 Q. Peyras et al. 


Definition 8. An FOLTL formula w is in FOLTL(Sf, VL) if Y = 3y1 : $1... Yn: 
Sn‘ Olyi,.--;Yn], where 0 has the following syntax: 0: = €|a|AVO0|O0A0| XO | 
G9 | F9, where a is an FO formula in NNF without any existential quantifier 
and £ is a literal. 


Definition 9. FOLTL(X,F,V|) is defined by the following grammar: ¢:: = £ | 
aldvd¢d|dad|X@|F¢| iy: s-¢, with a an FO formula in NNF without 


any existential quantifier, £ a literal and y E V. 


Definition 10 (Geneva fragment). The Geneva fragment of FOLTL consists 
of formulas y a G(¢@) s.t. @ is a closed formula of FOLTL(X,F,V|) and y is a 
closed formula of FOLTL(Af, Y|). 


Definition 11. Given a formula ¢ € FOLTL(X,F) in NNF, we define its stride 
Ky as the maximal number of nested X connectives. Formally : 


Ke = Kro = 0 (if £ is a literal) Kx = Kg+1 
Kyo. = Kiz- = Kọ K6.n¢2 = Koiva = Mmax( Kg, Koa) 


Theorem 2 ([|24]). The Geneva fragment enjoys the FDP. If y a G(@) is a 
satisfiable formula in this fragment, for each sort s the (exact) bound on the 
domain size is: |Consts| + (Kg + 1) x |Vs|. 


3.3 Semantics of Cervino 


In this section, we define the semantics of a Cervino machine as an FOLTLŽ 
formula. Notice first that, in Cervino, the next instant is referred using the prime 
symbol, applied to relations only: this translates to an FOLTL sub-formula using 
the X connective, after application of the semantics. 

Now, a frame condition is defined as a formula that specifies that a certain 
relation will not change (between the instant before and after the event occuring) 
for tuples satisfying some constraints. 


Definition 12 (Frame condition). We define a frame condition as a formula 
expressing that, under some hypotheses, a certain relation does not change along 
a transition. Given the the tuple (r, g, p) wherer E Rz, ZE VI p is a Boolean 
formula, where variables in may appear free, we define the frame condition 
unchanged|r, z, y] as the formula YZ : 8- Y => (r(@) & Xr(Z)). 


Definition 13 (Semantics of an event). Let ev be an event of a Cervino 
machine declared as follows: event evf} : 3] modif {T}, with modif = modifies 
qı at {(Z1)-ai},...,q; at {(Z;)- Yj}, where the free variables in each Yy are 
included in £k, Y. Its semantics is defined as [ev] = 3y : 5- (7 A [modif]), where 


[modif] = ( VAN unchanged|r, Z,T]) ^ ( \ unchanged|qx, Tk, Wx) 
reR\ {415-55} 1<k<j 


where each list of variables have sorts corresponding to the profile of r. 
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For any binary relation r that enables btw|r], the ternary relation béwI{r] 
stating that there exists an acyclic path between two elements passing through 
a third element is axiomatized in FO following [18]. 


Definition 14 (Semantics of between). Given a binary relation symbol r, 
the semantics of btw/r] is given by adding axioms of transitivity, antisymmetry, 
partial totality, partial reflexivity, cycle maximality, transitivity of reachability, 
path consistency, taken from [18] in addition to the following axiom: 


Vr, y: s: [r(æ,y) < (btwfrz,y, y) a (Wz: s- btwfr|(x,z,z) > btw/r](z,y, 2)))| (S) 


The property (TC) relating btw/r] and r* can be deduced from the axioms pro- 
vided that the domain of s is finite. 


Va,y: s+ [btw/r/(x,y,y) + r*(x,y)| (TC) 
Then, calling BTW the conjunction of all between axioms, [btw/r/] = GBTw. 


Definition 15 (Semantics of Cervino). Let Mch be a Cervino machine with 
axioms W1,...,Wn, events ev1,...,e€Um and such that the relations enabling btw 
are T1,...,T1. Then its semantics is given by the following FOLTL* formula: 


[Mch] = ġo A (Gtr) A btw 


where do = A Wir Dir = V [evi] and btw= A [btutril] 
i=1 i=l 1<i<l 


The semantics of a Cervino machine is then an FOLTL® formula describing 
the set of its traces. But, since we aim at verifying systems, we are not only 
interested in the set of traces but also in the set of counterexamples of a property. 
This set is also described by an FOLTL® formula which is the conjunction of 
the semantics of the machine and the negation of the property we aim to check. 


Definition 16 (Counterexamples). If Mch is a Cervino machine and ¢ is 
an FOLTL* formula. Then we define [Mch]¢ = [Mch] ^ [-¢] 


4 Basic Transformations 


In this section, we present basic transformations used to build the more complex 
TFC and TTC tactics (respectively presented in Sect. 6 and 7). These transfor- 
mations are used to map (the semantics of) a system specification into a more 
general Geneva formula. 


4.1 Transforming Equality 


Equality is replaced* by a dynamic congruence relation =,, for every sort s. The 
signature is therefore extended with these fresh =, relations. 


4 In practice, we ensure that the semantics of the modifies section, which uses equality, 
is also affected by this transformation. 
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Definition 17 (Equality transformation). Given a fresh binary relation =, 
for every sort s of a formula, the transformation of equality is defined recursively: 


— Abs-(tı = te) = tı =z t2 if the sort of tı (and necessarily of t2) is s 
- Abs_(f) =2£ 


— (the rest is just a recursive walk on formulas) 
Furthermore, the following set Eq- of axioms is added to the whole specification: 


— for any sort s: GVT : S- £ =s £ 

— for any sort s: GYT : 5, Y : S+ £ =s Y > Y =s T 

— for any sort s: GVT : S$, Y : $,Z : S- £ =s Y A Y Ss Z > T = Z 

— for any relation r and adequate sorts & conforming to the profile of r: 
GYZ: 8,9: 8- (£1 =s; Y1 A- ^ En =s, Yn) > (r(Z) > r(¥)) 


Lemma 1. Given an FOLTL- formula ¢, if ọ is satisfiable then Abs-(¢) is 


satisfiable (and does contain = anymore). 


Proof. Proof validated in Coq. It is easy to see that equality is a particular case 
of the equivalence relation introduced by this transformation. 


4.2 Restricted Skolemization 


The following transformation corresponds to a form of Skolemization meant to 
create only new constants symbols. Its main purpose is to introduce constants 
that can then be used by instantiation (Sect. 4.3). Existentially-quantified vari- 
ables can be substituted by fresh constants, except when under a G connective. 


Definition 18 (Skolemization). Skolemization is defined by the following 
operation (all fresh constant symbols are added to the signature): 


= Absa ti = t2) = ty = to and Absa (£) =£% 


- Abs (ay : s - ġ) = Abs3(¢[y => cl) where c is a fresh constant symbol 
— (the rest is just a recursive walk on formulas) 


Lemma 2. Given an FOLTL- formula ¢, then Abs3(¢) and ¢ are equisatisfi- 
able. 


Proof. Proof validated in Coq. Corresponds to a usual Skolemization 
procedure. 


4.3 Instantiation 


One of the main limitations of the Geneva fragment is the prohibition of temporal 
operators under universal quantifiers. The solution we propose to this problem 
is to finitely instantiate such universal quantifiers. The following transformation 
formalizes this idea: all universal quantifiers over temporal formulas are replaced 
by a conjunction over the set of constants and existentially-bound variables. 
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Definition 19 (Forall instantiation). Given a set T of constant and variable 
symbols, we define the transformation of universal quantifiers as follows: 


= Abs zr (ti = t2) = ty = t2 and Absy (£) =f 
- Abs z(y : s- $) = Jy: s+ Absy Tuty} (O) 
- if pE FO (¢ does not contain temporal connectives) then Abs z(Vz : s- $) = 
Vz: s-ġ, otherwise Aby (Yx :s-ġ)= N Abs zlolz + cl) (where Ts is the 
cET, 


set of terms in T of sort s) 
— (the rest is just a recursive walk on formulas) 


Remark 1. There is no need to transform a universal quantifier if all temporal 
operators in its scope permute with it, for instance: Væ - GP is equivalent to 
G(Yz - P) and Vz - (XP) = (XQ) is equivalent to X(Vz-P > Q). 


Lemma 3. Given an FOLTL- formula ¢, if ¢ is satisfiable and T C Const then 
Absy.z(@) is satisfiable. 


Proof. Proof validated in Coq. This operation consists in instantiating universal 
operators, thus preserving satisfiability. 


4.4 Addressing Transitive Closure and the Between Relation 


Since we target fragments of FOLTL (without transitive closure), we define the 
transformation Abs,(), which leaves a formula unchanged except it uninterprets 
the operator *, i.e., Abs„(ġ) returns ¢ where every occurrence of r* is considered 
as a new relation symbol, unrelated with r. 

Besides, the between relation axioms does not fit into Geneva or LTR, so we 
define their abstract semantics as follows. 


Definition 20 (Transformation of between axioms). Given a binary rela- 
tion symbol r, we define (btw/r]) = GBTW where BTW is the conjunction of the 
axioms from Definition 14, except that 


— the axiom S is replaced by the axiom (AS) (in order to prevent existential 
quantifier in the scope of a universal one) 

— and the property (TC) relating r* and btw/r] is now considered as an axiom 
(since r* has no semantics in the targeted FOLTL fragments) 
Vr, y: s- [ræ,y) > (btw/r](x,v,v) A (Vz: s - btw/rf(a,z,z) > btw/r](x,y,2))) (AS) 


Va,y: s- [btw/rf(x,y,y) = r*(x,y)] (TC) 


4.5 Geneva Transformation 


The basic transformations introduced above are mainly used together, in a spe- 
cific order. 
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Definition 21 (Geneva Transformation). We define: 
Abscen(¢) = Abs, ( Absy const (Abs3(Eq_ ^ Abs-(¢)))) 


Theorem 3. Given y e FOLTLz (V) and @ € FOLTL,,u9,(X,F,V) then 
AbsGen (Iyı : si- (Y A Gyz : s2 - Q))) belongs to the Geneva fragment. 


Proof. Recall the conditions to belong to Geneva: (1) no G operator in the 
scope of an existential quantifier that is itself under an G connective; (2) no 
existential quantifier in the scope of a universal quantifier; (3) no equality; 
(4) no temporal quantifier in the scope of universal quantifiers; and (5) no 
transitive closure. Given w,@ satisfying the given hypotheses, let us write 
a = dy, : si- (Y a Gyz : sà - ġ)). Then, in a, existential quantifiers appear 
either at the head of the formula or under an G operator over the ¢ formula. 
Since ¢ contains no other temporal connectives than X and F, condition (1) is 
met. Condition (2) is met as all existential quantifiers appear before universal 
quantifiers. Abs_(.) ensures that equality is not used in the final formula, thus 
ensuring condition (3). Absy Const(-) instantiates all universal quantifiers that 
contain temporal connectives in their scope (we assume that if such operator 
could have been swapped with an universal quantifier, it has been done before- 
hand), which ensures condition (4). Finally Abs,(.) erases the reflexive transitive 
closure, ensuring condition (5). Since it is obvious that none of the transforma- 
tions can introduce formulas breaking any of the conditions, we conclude that 
Absgen(a) belongs to Geneva. 


5 TEA: Transforming Existential Quantifiers 


We now present the fully-automatic TEA transformation. It starts with the 
observation that the formula specifying events (see Definition 13) is of the shape 
Giz - V; ev;(#), that is, in every state, at least an event is fired. The gist of the 
TEA transformation is then twofold: (1) we replace these existential quantifiers 
by universal ones; (2) for every such existential quantifier, we add a fresh relation 
2, which holds only for the constant semantically associated to this quantifier. 

The whole resulting abstract specification lies in the LTR fragment, which 
enjoys the BDP (Theorem 1). The formula specifying events is however more 
general than the original one, because it allows more transitions to happen. The 
abstract system may thus violate a property holding on the original specification. 
But it is now decidable to check whether the property holds in the abstract 
system and, if so, this entails that it also holds in the original system. 

Before presenting the transformation, notice that, in the following, we con- 
sider event formulas, that is primed FO- formulas of the shape ¢ = dy, : 
Syy- -3 Yn | Sy, © VTI : 8a,,+++,Lm € Sen © Y, where Y is in NNF and does not 
contain any first-order quantifiers. These formulas naturally arise when putting 
the semantics of events in prenex normal form. We also suppose we have a supply 
of fresh relation symbols, written E; (one for every y; 1 Si <n). 
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To devise the transformation and prove its soundness, we first introduce a 
formula specifying that the E relations are functional. This schema appears in 
the final abstract specification. 


Definition 22 (Functional E relations). Given an event formula ¢ = Jy; : 
Syse- -Yn © Syn © VLI + Swis- - -3m : Sæm ` Y, we define the functional formula 


based on ¢ as: AX*”(ġ) = G( A V21, 22 : Sy, ` (Ei(21) A E,(22)) > 21 = 22) 


where Ey,... 


i 
in are fresh unary relation symbols. 


As we introduce these E relations, we also define an enrichment of the event 
formula accounting for the extended signature. This new formula appears as a 
link between the two lemmas entailing soundness. 


Definition 23 (Enriched event formula). Given an event formula ¢ = Jy. : 
Syis---3 Yn € Syn VXI 2 Szi,- --;Lm | Sem `Y, we define the enriched event formula 
based on @ as: 


n 


ọ = AX (p) ^ EA UN A Vai Srei Ema Sam u)| 


i=l 


where Ey,...,E, are fresh unary relation symbols. 


We now present the essential part of the transformation, transforming an 
event formula ¢ into a purely universal one U(¢), more general than ¢. In other 
words, U(¢) allows more transitions than ¢ if we ignore the specification of 
’1,-.-E,. To do that, for any variable y whose corresponding fresh relation is 
2, we proceed with the following steps. First: equality between y and another 
variable is replaced with the relation E applied to the latter; and any other literal 
£ containing y is replaced by E(y) = £. Once these transformation are done, it is 
possible to replace existential quantification over y by a universal quantification. 


Definition 24 (Transformation). Given an event formula ¢ of shape Jy. : 
Sys- Yn È Syn ` YT1 £ Seis- --;Em | Sem Y, we define the (TEA) transformation 
function on @ as: 


UA): Vyr syi Yad Syn Up Wiis Smi Em eSa) 


where ¥ = {y1,---,Yn} and where E1,...,En are fresh relation symbols (one for 
every y € Y); with Ugly) defined recursively as follows: 


~ Uzly: = y;) = (Eily:) > E; (y:)) a (Ej (yj) => Eilyz)) = (CE: (yi) v Ej (yi) ^ 
(+E; (yj) v Ex(y;) 
- Uzly: A y) = Eis) = -E;()) a (Ej) = Ei(y;)) = (cEi(yi) v 
TE; (yi)) A (CE; (ys) v Ei (y;)) 

Ugly: = d) = Ugld = yi) (d) where d ¢ ¥ (d is either a constant or a 
variable in 7) 
Uzly: 4 d) = Usd £ yi) E;(d) where dg ¥ (d is either a constant or a 
variable in £) 
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-= Ug) = (A Ea, (Ya,)) > L = (V Ea, (Yo,)) v £ where £ is a (possibly 
k=1 


= k=1 
primed) literal and {Ya,,---Ya;} =FV(O) ay 
— (the rest is just a recursive walk on formulas) 


Example 1. Consider the following event formula, stating that there is an event 
making R true in the next state for a variable y (other variables remain 
unchanged w.r.t. R): 6 = Jy : A- Ry) a (Va: A- x Ay => (R(x) & R'(x))) 
that is, in prenex form: Jy: A - Vx : A- R'(y) a (x = y v (R(x) A AR (2)) v 
(R(x) A R'(x))). Then there is only one fresh E relation, and U(¢) is: 

) 


Yy, x : A- (“E(y) V R'(y)) A (E(x) V (R(x) A =R'(x)) v (R(a) A B'(a))) 


Now, the following lemma states that every model of the enriched event 
formula is also a model for the transformed event formula. 


Lemma 4. Given an event formula ọ = Jy : Sy- Yn © Syn ` VTI 
Szi; --;Zm | Sem ` Y, we have dF Ugle). 


Proof. Proof validated in Coq. 


Lemma 5 applies to a formula representing a whole specification: if such a 
specification is satisfiable, then a certain transformed version of it is satisfiable 
too. 


Lemma 5. Let 0 be an FOLTL- formula, and ġọ be an event formula on the 
same signature. Then if 0 a Go is satisfiable, 0 a G(Ug(¢)) a AX?” (¢) is also 
satisfiable. 


Proof. Proof validated in Coq. 


Definition 25 (Abstract semantics). Given a Cervino machine Mch such 
that the relations enabling btw are rı,...,rı, we define U(Mch) = ġo ^ 
GU (dir) A Shey, where ġo and Qır are defined as in Definition 15 and 
Obtw = A (btwiri]). Also, given an FOLTL_ formula ¢, we define Ug(Mch) = 


1<i<l 


Abs,(U(Mch) A —¢). 


Theorem 4 (Soundness). If [Mch]q is satisfiable, then Ug(Mch) is also sat- 
isfiable. 


Proof. This is a direct application of Lemma 5. 


Theorem 5. Given a Cervino machine Mch such that ġo and Qtr are defined 
as in Definition 15, if d0,¢ € LTR then Ug(Mch) € LTR. 


Proof. Directly follows from the definition of U(.). 
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6 TFC: Transforming Frame Conditions 


The TEA transformation has the advantage of being fully automatic but it 
can be inconclusive in a number of cases. For instance, the verification of a 
distributed system involving strong interactions between its components, which 
induces events with two or more parameters, is likely to be inconclusive using 
TEA. This is because the universal quantifiers that are introduced by TEA are 
abstracting these interactions (which are naturally expressed with existential 
quantifiers) in a too drastic way. 

In this section, we present another transformation, called TFC, which over- 
comes these limitations but requires some intervention from the specifier. 

Instead of targeting the LTR fragment, we now target the Geneva one, which 
allows for existential quantifiers in the scope of G, but forbids temporal formulas 
in the scope of a universal quantifier. As a consequence, frame conditions, which 
are typically of shape Vz : $- Yeona => (r(x)  Xr(a)), are not expressible in 
Geneva. In order to fit into it, such universal quantifiers are instantiated over 
constants (see Absy,const(-) defined in Sect. 4.3). But then a large part of the 
information included in the frame conditions is lost. Therefore, we associate 
some particular kind of invariant properties, called stability axioms, with each 
event, as a finer transformation of frame conditions. Intuitively, a stability axiom 
is a pure FO formula that is preserved by an event. Since it is expressed in pure 
FO, the preservation of a stability axiom is then expressible in Geneva. 


Definition 26 (Stability Axiom). Given a set of frame conditions C, an FO 
formula ¢ is a stability axiom for C if CE ọ > X¢@. 
Ste denotes the set of stability axioms for C. 


The specification of stability axioms is a creative step, but it can be eased with 
the help of a syntactic condition, which is sufficient to be a stability axiom. The 
idea is that a formula of the following shape is necessarily a stability axiom: 
Phyp > P, Were Yhyp corresponds to the guard of a frame condition that leaves 
a relation r unchanged, and ¢ only refers to the relation r. 


Example 2. In order to illustrate the use of stability axioms, let us consider the 
leader election distributed system, introduced in Sect.2. Since TEA does not 
succeed in proving the safety property, we can try TFC with the following sta- 
bility axiom for event send: 


V x,y: Node - !succ(src,x) = (!toSend(x, y) V (x # lmax A btw[succ](x, Imax, y))) 


This axiom expresses that if a node x different from the successor of src has 
an ID y in its mailbox, then the node with the greatest ID is located between x 
and y (recall that a node and its ID are conflated). This means that outside the 
scope of the event, an ID cannot jump over the node with the greatest identifier. 

Exhibiting this stability axiom requires some work. It would also be possible 
to proceed using an inductive invariant but, since the property to check is not 
inductive, doing so would also require some effort. 
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Example 8. In order to illustrate the difference between stability axioms and 
inductive invariants, we take a toy token protocol as an example. For the sake 
of simplicity, we consider a property to check that is already inductive. The 
protocol features one token passing from nodes to nodes with only one send event 
send(x,y), with body: token(x) A !token’(x) A token’(y) A frame, where frame := 
Va-(44#xAz#y) => (token(z) © token’(z)). 

The (inductive) property to check is that there is always at most one node 
holding the token. To prove this property without relying on its inductiveness, we 
can use the following stability axiom: stab := V z - (z # x A z Æ y) => !token(z). 
Contrary to the inductive invariant, the stability axiom has free variables match- 
ing the parameters of the event (which are implicitly quantified existentially). 
Also the preservation of the stability axiom follows from the frame condition as 
frame F stab = Xstab, while the preservation of the inductive invariant follows 
from the whole transition. 


Remark 2. Notice that this property is also true for the nodes that are in the 
scope of the event, i.e., src and its successor. So in this case, the stability axiom 
is very close to an invariant property. But this is not the case in general. A 
distinguishing aspect is that TFC with this stability axiom succeeds in proving 
the safety property, whereas it would not be possible to deduce it from the 
“invariant” version of this stability axiom. 


The TFC transformation is performed in two phases: 


1. Stability axioms, which are provided by the specifier, are added to the body 
of each event. At this step, the semantics of Cervino is strengthened by the 
transformation. The obtained formula is not in the Geneva fragment, in par- 
ticular because of the frame conditions. 

2. The Geneva transformation, which is presented in Sect.4, is applied. In par- 
ticular, the frame conditions are abstracted by equality transformation and 
instantiation, but the stability axioms are left unchanged. 


Definition 27 (Event enrichment with a stability axiom). Let ev be an 
event of a Cervino machine declared as: event evfy : 3] modif{r} and C be the 
frame condition of ev, C = [modif]. Given a stability axiom T for C, we define 
the enrichment p(ev,Z)) of ev with T as: p(ev,Z) = 3g: 8-7 AC A (T => XT). 


Definition 28 (Cervino machine enrichment with stability axioms). 
Let Mch be a Cervino machine with axioms W1,...,Wn, events ev1,...,€Um 
declared as event ev; [Yi : Sı] modif {7} for each i € 1..m and such that the 
relations enabling btw are r1,...,rı. Let sta be a function mapping each event 
to a stability axiom for the according frame condition. Using the same notation 
as Definition 27, we define the stability axiom enrichment p(Mch, sta) of Mch as 
p(Mch, sta) = ġo ^ Goir A Pbtw 

n m 
where po = N WPi, Pir = V plevi, sta(ev;)) and dprw = A (btvfri). 

i= 


i=1 1<i<l 
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Definition 29 (Abstract semantics). Given a Cervino machine Mch and a 
function sta, mapping each event to a stability axiom, we define the stability 
axiom semantics as F(Mch)¢ = Abscen(p(Mch, sta) a =) 


Theorem 6 (Soundness). If [Mch]¢ is satisfiable then F(Mch)4 is satisfiable. 
Proof. Follows from Lemmas 1, 2 and 3. 
Theorem 7. If —¢ e€ LTR then F(Mch)¢ € Geneva. 


Proof. Follows from Theorem 3. 


7 TTC: Transforming Reflexive-Transitive Closure 


We now present a simple, effective transformation technique to approximate 
reflexive-transitive closure (which is present in Cervino and its FOLTL* seman- 
tics). This technique has shown to be useful to prove some liveness properties. 
As is well known, transitive closure cannot be fully specified in pure FO. On 
the other hand, it can be specified in pure FOLTL, but the axiomatization we 
are aware of does not fit in the fragments considered here. However, it is possible 
to define an interesting approximation that does fit in the Geneva fragment. 
Informally, the crux of our technique relies on the following observation: 
any property propagating along a binary relation will eventually propagate to 
the reflexive-transitive closure thereof. This is proved (see Theorem8 below) by 
following the definitions of the transitive closure and of the eventually connective. 


Definition 30 (Propagation schema). Given binary relations r and t on a 
sort s, given a formula P with k + 1 free variables (k > 0), the first of which 
(of sort s) is distinguished in the following. Given k variables £ of appropriate 
typing, we define the propagation and closure schemas as follows: 


Propagates([r, P,z| = Vu,uv:s-r(u,v) > G(Plu, z| > E P[v, Z]) 
Closure|r, t, P,z] = Propagates|r, P,z| = Propagates|t, P, x| . 


Theorem 8 (Propagation). Given a binary relations r on a sort s, the follow- 
ing property over its reflexive-transitive r* closure is valid: Closure[r, r*, P, z]. 
Proof. Proof validated in Coq. 

The proof sketch is the following : we consider the set of element to which the 
property eventually propagates. Then we use the hypothesis that the property 
propagates along a binary relation r to show that this set is closed under the 
relation r. Then as the transitive closure from some element is the smallest set 
closed under the relation r, we know that the property propagates to any element 
in the transitive closure. 

We prove that under the Propagates|r, P, z] hypothesis, for any u, the set 
of v’s satisfying G (Plu, Z] > FP[v,z]) is included in the set of v’s that are 
reachable from u along r. Let M be a structure and C an assignment s.t. M,C F 
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Propagates|r, P, x]. We assume that there is an instant i such that Plu, 7] holds 
(otherwise the satisfaction of the axiom is trivial). Then M,i,C E FP[u, z]. Also, 
given v s.t. M,i,C E FP[v, 7], there exists k > i s.t. M,k,C E P|v, g]. For any 
v’ s.t. M,0,C E r(v, v’) Propagates|r, P, Z| implies M, k,C E FP{v’, x]. Thus 
M,i,C F Pv, g]. Then M,0,C E r*(u,v) implies M,i,C = FP[v,z]. Hence 
Closure|r, r*, P, Z] is valid. 


Given this theorem, the technique we propose consists in replacing the reflexi- 
ve-transitive closure of a relation (which fits in Cervino and FOLTL® ) by an unin- 
terpreted relation satisfying the closure schema shown above, for some property 
P that depends on the sort of the considered binary relation as well as, possibly, 
other arguments. Remark that finding such a property P requires creativity: the 
specifier must come up with a relevant propagating property. 


Example 4. In the case of the leader election example, we use TTC to check that 
a leader will be elected at some point. The property we use is propagation along 
succ of having a given ID in one’s mailbox (Propagates|succ, toSend, id]). 


Definition 31 (Abstract semantics). Let Mch be a Cervino machine, such 
that 7j,,-.-,7j, are binary relations enabling btw, and rp,,.-.,Tk, are binary 
relations whose reflexive-transitive closure is used in Mch. Now, given formulas 
P,,...,Pe, where for every 1 < i < m, FV(P) = {2,21,...,%n,} (with x the 
distinguished free variable), we define the transitive closure transformation as: 


T(Mch) = d0\Gdir A btw 
A ( VAN \ Closure|r r, Tki, Pi, (C1,--- Cn: )]) 


1<igm (c1,..-Cn,; )eConst™ 


where oo and tr are defined as in Definition 15 and dptw = A (btwlr;,]). 
1<i<l 


We also define T(Mch)y = Absgen(T(Mch) a =) (notice that, due to the 


application of the Geneva transformation, the r relations become uninterpreted). 
Theorem 9 (Soundness). If [Mch] 4 is satisfiable then T(Mch) 4 is satisfiable. 
Proof. Follows directly from Theorem 8 and Lemmas 1, 2 and 3. 

Theorem 10. If —¢ġ € LTR then T(Mch) 4 € Geneva. 


Proof. Follows from Theorem 3. 


8 Evaluation 


To evaluate the relevance of our three tactics, we applied them to several mod- 
els of distributed protocols. Our research questions were (1) to check that our 
methods were applicable to real models; (2) to check whether our approach was 
efficient enough; and (3) to assess the effort for the specifier to come up with 
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Specification Type Technique Bound Effort 
TLB shootdown Safety TEA 2 = 
Dining philosophers Safety TEA 2 — 
Lock server Safety TEA 2 — 
Gset (CRDT) Liveness TEA 2 - 
2Pset (CRDT) Liveness TEA 2 — 
Leader election Safety TFC 7 4 
Liveness TTC 6 1 
Token ring Safety TFC 7 7 
Liveness TTC 6 1 
FIFO Liveness TTC 6 1 


Fig. 3. All verifications take less than 20s (“effort”: estimation of user effort with the 
number of atoms (literals and equality tests) used in the TTC or TFC parameters). 


parameters for TFC and TTC. Our strategy was always first to apply the TEA 
tactic. If TEA failed, then in the case of safety properties, we devised stability 
axioms in order to apply TFC. Otherwise, for liveness properties and for systems 
relying on transitive closure, we relied on TTC. 

The Cervino prototype takes a Cervino specification as input and gener- 
ates Electrum models which are then fed to the Electrum Analyzer [4], which 
itself calls a complete procedure in nuXmv [5]. On a general note, efficiency 
can be compromised in the case of the TTC and TFC tactics due to larger 
inferred bounds than for TEA. Furthermore, the size of LTL formulas generated 
by Electrum for nuXmv grows quickly as the tool merely unfolds quantifiers 
into conjunction and disjunctions, depending on the bounds. For this reason, 
we leveraged some properties of the Geneva fragment to end up with smaller 
models: (1) the size of each domain is an exact bound rather than just an upper 
one; (2) all constants are distinct; (3) existential quantifiers can be unfolded on 
a limited part of the domain. This is the case because the proof of the BDP for 
the Geneva fragment [24] shows that, if there is a model of a Geneva formula, 
there is a model satisfying these properties. The specifications we evaluated are 
of moderate complexity but are not just toy models: 


TLB shootdown The TLB Shootdown algorithm [3] is part of the Mach operat- 
ing system. Processors keep a cache of page tables in a Translation Look-aside 
Buffer (TLB). The safety property we prove is that whenever the protocol 
ensures that the page table is updated, the corresponding update will be 
flushed by either the initiator or the responder. 

Dining philosophers This classic protocol features an unbounded number of 
philosophers sharing forks. We prove a mutual exclusion property, that is that 
a fork cannot be simultaneously held by two different philosophers. 
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Lock server We present a simple lock server protocol studied in Verdi [27] and 
Ivy [21]. The protocol features a single server and an unbounded number 
of clients willing to hold a lock. The safety property we verify is that two 
different clients cannot simultaneously hold the lock. 

Gset and 2Pset Conflict-free Replicated Data Types (CRDTs) are a family 
of concurrent protocols where a data structure is replicated in a network 
and where the replicas can be independently and concurrently updated. We 
model the Grow-only Set (G-Set) [26] and the 2-Phase Set [26] CRDTs. In 
both cases, we prove that any update is eventually delivered to all replicas. 

Leader election The leader election protocol, presented in Sect. 2, is inspired 
by [6]. We notice however that a node sends all the contents of its mailbox at 
once, which is a strong simplification. 

Token ring Token Ring is a classic protocol where a token is passed through 
nodes with mailboxes in a ring. We prove a safety and a liveness property. 
In the first case, we use the TFC tactic with 2 stability axioms, one for 
each event, which basically state that if there is no token apart from the one 
transferred, then no token can appear on unmodified nodes during the event. 
The TTC parameter says that the property of holding the token is the one 
that should propagate, under strong fairness. 

FIFO This protocol is a simple mutual exclusion protocol based on a FIFO 
strategy. We prove a liveness property using TTC, stating that for any integer 
i, being in the i-th position of the list is a propagating property. 


Our conclusion to these case studies is the following (Fig.3). First the 
TEA tactic is only efficient for models involving few interactions, which can be 
attributed to the loss of precision when using universal quantifiers. Regarding 
TFC, the effort required to find stability axioms seems to be similar to finding 
an inductive invariant. For TTC, all propagating properties were very simple. 
Finally, we noticed that, for more complex systems, TTC and TFC can lead to 
problems that are too large for the model-checker to answer in time (e.g. 1h.) 


9 Related Work 


The usual way to check a safety property is to exhibit an inductive invariant for 
the system. The TEA tactic is completely automatic and can handle safety prop- 
erties but remains quite limited. In our experiments, the TFC tactic showed to be 
as flexible as an invariant to prove safety properties. Finding stability axioms or 
an inductive invariant appear similar in difficulty. However, once found, checking 
an inductive invariant is quicker in computation time than checking the abstract 
system obtained with stability axioms. On the other hand, stability axioms allow 
to check complex temporal properties. 

Regarding liveness properties, important approaches are based on exhibiting 
a variant or using the Liveness-to-Safety reduction method proposed in [19]. For 
the simple examples done with TEA, such methods would allow to prove the 
properties with little efforts, if done right, but are not fully automatic contrary 
to the TEA tactic. In both case the computation time is really low. 
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With the TTC approach, we do not need to exhibit any sort of invariant 
and the propagating property to exhibit has always been straightforward. To 
our knowledge there is no easiest method to prove some of the examples we 
presented. For example, the liveness property of the leader election protocol 
requires to exhibit a variant and an invariant and both are harder to exhibit 
than the propagating property. The Liveness-to-Safety reduction method also 
applies here, but it requires to find an invariant on the system obtained by 
reduction, as well as finding an axiomatization of the reflexive-transitive closure 
preserving the Liveness property (while this axiomatization is embedded in TTC 
tactic). However, despite being more immediate in our examples, the TTC tactic 
is less flexible than these two alternatives since it applies for liveness properties 
based on the reflexive-transitive closure. 

Our approach can also be compared with the specification of parameterized 
systems. Cubicle [7-10] is an SMT-based model-checker for the verification of 
safety properties on parameterized systems. Cubicle is efficient for challenging 
systems but, contrary to our techniques, it enforces strict syntactic constraints on 
guards and on the checked property. Others techniques based on labelled proof 
systems have also been proposed [2]. In [15], the safety of the TLB Shootdown 
algorithm is proved using such a technique. The user must exhibit the correct 
invariant for the proof system to conclude; while the TEA tactic is automatic. 
Also, some methods, such as invisible invariants [25], rely on finding automat- 
ically a candidate for being an inductive invariant and then checking if this is 
the case without needing any input from the user. Such an approach is auto- 
matic and efficient but only applies to Bounded-Data Parameterized Systems 
while our methods applies to a wider context. While most work on parameter- 
ized systems focuses on safety properties, [11] addresses liveness properties, but 
remains essentially theoretical. We remark that the techniques mainly used for 
parameterized systems are mostly orthogonal to those presented in this paper, 
and a combination of both could be fruitful. 


10 Conclusion 


We devised three original, sound (but incomplete) transformations, that allow to 
check that a state machine specification, expressed in a rather expressive frag- 
ment of FOLTL=, enjoys a temporal property, expressed in the same setting, 
whatever the bounds on domains (associated with sorts) are. The transforma- 
tions were proved correct in Coq. We evaluated our approach on several case 
studies and found that the transformations were effective and, for the semi- 
automatic ones, demanded an effort comparable to other approaches. A draw- 
back is that the computed bounds can sometimes grow too much for model- 
checking to be feasible with the back-end tools we used. Notice that our approach 
is orthogonal to the main other approaches (for instance, inference of invariants) 
and could certainly be combined with some of them. Once a universally quan- 
tified inductive invariant Inv is found, such a combination would be possible by 
adding an axiom of the form G Inv to our abtract specification. This refines the 
abstraction while fitting in both LTR and Geneva. This is left for future work. 
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Abstract. We present a formal framework to certify k-induction-based 
model checking results. The key idea is the notion of a k-witness circuit 
which simulates the given circuit and has a simple inductive invariant 
serving as proof certificate. Our approach allows to check proofs with an 
independent proof checker by reducing the certification problem to pure 
SAT checks and checking a simple QBF with one quantifier alternation. 
We also present CERTIFAIGER, the resulting certification toolkit, and 
evaluate it on instances from the hardware model checking competition. 
Our experiments show the practical use of our certification method. 


1 Introduction 


In many verification applications, k-induction [34] (also known as temporal 
induction) is used as a powerful technique that reduces model checking to a series 
of SAT problems. It has been extensively investigated as an effective approach 
for unbounded model checking [18,22]. As a generalisation of simple induction, 
for a given safety property, the k-induction method concerns a base case and an 
inductive case: the base case is a bounded model checking problem with a depth 
of k; the inductive case assumes the property holds for k consecutive steps, then 
checks it also holds for k + 1 steps. The safety property is said to be k-inductive if 
both conditions are satisfied. The nature of the k-induction algorithm allows it to 
be integrated with modern SAT/SMT solvers. For example, reduction techniques 
such as preprocessing have been investigated with k-induction in an incremental 
setting [17]. The present state-of-the-art also concerns combining k-induction 
with existing SAT-based model checking (SMC) techniques including interpo- 
lation and property directed reachability [23,27]. Furthermore, k-induction has 
also been extended to the context of infinite-state systems [13, 19,26,32], as well 
as software verification [16]. Another variant of this line of research is the use of 
k-induction in sequential equivalence checking [31]. 

Model checking has been an effective technique for the verification of safety- 
critical systems. In particular, applications deployed in industrial settings such 
as nuclear facilities, increasingly utilise model checking to gain trust in the cor- 
rectness of their designs [20,30,36]. In such ultra safety-critical applications the 
certification that the model checking results are in fact correct is crucial. We 
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argue that in model checking generic machine checkable certification is still in 
its infancy in contrast to related fields. For instance in SAT competitions [2,24], 
certifiable proofs are mandatory. This has helped to improve the trust we have 
in SAT solving results as well as the quality of SAT solvers tremendously. 

Even though counterexample validation is commonly used in model check- 
ing to certify negative verification results through simulation, producing a 
generic machine checkable proof on success is less straight-forward. To miti- 
gate this problem, certification of model checking has been suggested earlier 
in [14,21,23,29,33,36,37], but the methods presented in these works are either 
not directly applicable to k-induction (in its vanilla form), produce k-induction 
specific certificates (fail to provide an inductive invariant), or are considered to 
have exponential certificates. This apparently made it hard to, e.g., require all 
model checkers to produce proofs in the hardware model checking competitions. 

As symbolic model checking of bit-level properties for hardware circuits is 
PSPACE-complete, we introduce in this paper a novel certification framework 
for k-induction-based model checking. Our proposed approach generates a fixed 
number of SAT problems together with a one-alternation only QBF, which 
are verified by an independent certifier, thereby enabling the certification of k- 
induction proofs at lower complexity. Our method efficiently extends the given 
model checking problem to finding a simple inductive invariant of a larger circuit 
as a proof of k-induction of the original circuit. In particular, the certificate size 
(as a circuit) is shown to be linear in size of the given model, and the inductive 
depth. We present CERTIFAIGER, which works as a complete tool suite for certi- 
fication, independent of any model checker. Experimental results show that our 
technique works efficiently and can be adapted for practical use. 

The rest of the paper is organised as follows: In Sect. 2 we introduce the 
notion of combinational simulation in the context of circuits. In Sect. 3, we study 
the formal property of combinational simulation and define k-induction-based 
model checking with an example. In Sect. 4, we present our proposed certification 
approach followed by theoretical results in terms of k-induction. We describe the 
implementation of our tool suite in Sect. 5, and report on experimental results 
in Sect. 6. Finally, we conclude in Sect. 7. 


2 Circuits 


In this section, we present a slightly non-standard notation to formalize sys- 
tems. It allows us to represent systems and particularly circuits symbolically in 
a compact way and is crucial to reduce notational clutter in the following. 

Let B(V) be the set of Boolean expressions (propositional formulas) over 
the Boolean variables V. We also write B(I, L) to denote the set of Boolean 
expressions over J U L, where I and L are two sets of Boolean variables. Given 
two Boolean expressions f(V), g(V) € B(V) we call them equivalent, written 
f(V) = g(V), if they have the same models. This notation is also applied to 
Boolean expressions over different sets of variables by simply interpreting them 
over the union of their variables. We use “œ~” for syntactic equivalence [15], 
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a“ ” 


—” for syntactic implication, and “=” for semantic implication. To define 
semantical concepts or abbreviations we stick to equality “=”. 

In the context of this paper, models are expressed in the form of finite logical 
circuits, where states can be seen as truth assignments to latches and inputs. 
Initial states are defined by the reset values of latches, in our case, represented by 
their reset functions. For each latch l in L, there is a reset function rı(L) which 
is a formula (Boolean expression) over a set of latches L, thus allowing cyclic 
definitions. Note that a cyclic definition can lead to unsatisfiable reset formulas, 
in which case there are simply no initial states. Additionally, for some L” C L, we 


define R(L”) = A l= rı(L) to allow us to analyse reset functions of individual 
IEL” 
subsets of latches. The transition relation is expressed as a “next state” formula 


associated with each latch, whereas non-determinism comes from inputs (which 
act as the environment). The successor value of each latch is defined by applying 
its transition function on the current values of latches and inputs. Intuitively, a 
safety property specifies that the system must not violate certain behaviours, i.e., 
only “good states” are reachable. In this paper we focus on such simple safety 
properties and leave liveness properties (see e.g., [29]) etc. for future work. 


Definition 1 (Circuit). A circuit C = (I, L, R, F, P) is defined as follows: 


1. I: the set of Boolean input variables. 

2. L: the set of Boolean latch variables. 

3. R= {r(L) | LE L} is a set of reset function formulas. 

4. F=4{fi(I, L) | l © L} is a set of transition function formulas, such that for 
every latch LE L, there is a transition function formula f,(I, L) € BU, L). 

5. P(I, L) € BUI, L) is a formula encoding the (good states) property. 


The reset functions characterise the initialisation of the circuit. Such defini- 
tion of reset abstracts the way how circuits are reset. As a short-hand we use 
L’ ~ F(I, L) to denote a conjunction of the corresponding equivalences, i.e., it is 


interpreted as A V œ fi(I, L). For clarity, we use subscripts as in L; to denote 
IEL’ 
a copy of the latch variables L in the temporal direction at some timestamp i, 


where Lo is the set of latches at timestamp 0 when the circuit is supposed to 
be initialised. Note that, using such transition functions to describe transition 
relations implies that there will always be a successor state. The temporal evo- 
lution of a system is expressed using the notion of unrolling, which has a specific 
length and follows the transition relation at each step. 


Definition 2 (Unrolling). For an unrolling depth m € N, the unrolling of a 
circuit C of length m is defined as the formula Um = N (Lisi ~ Fi, Li)). 
i€[0,m) 


Note that in this definition, we use J; and L; as sets of variables, whereas Um is 
a formula. For m = 0, the conjunction is empty thus the formula is trivial. 


Definition 3 (Initialised unrolling). An initialised unrolling of a circuit C, 
with C = (I, L, R, F, P), is defined as Um ^A R(Lo), where Um is an unrolling. 
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We say an unrolling is safe if and only if the property holds at every time- 
stamp along the whole length of the unrolling. 


Definition 4 (Safe unrolling). Unrolling Um of a circuit C = (I, L, R, F, P) 
is said to be safe if 


Um=> N PUR, 
i€[0,m] 


Definition 5 (Safe initialised unrolling). An initialised unrolling Um ^ 
R(Lo) of a circuit C = (I, L, R, F, P) is said to be safe if 


Um AR(Lo) > | PU, Li). 
i€[0,m] 


We are now ready to introduce the notion of a combinational extension 
between two circuits. It is purely syntactic based on sharing inputs and latches. 


Definition 6 (Combinational extension). Given circuits C = (I, L, R, F, P) 
and C’ = (T', L’, R’, F', P’), C’ combinationally extends C if I = I’ and LC L’. 


As noticed above, this definition allows us to interpret the inputs and latches 
of a circuit as being part of another circuit. In practice for instance we simply 
assume that the first |L| latches of the circuit C” are mapped to those of C 
assuming some ordering of the latches, as it is for instance the case in the AIGER 
format [7] used in the Hardware Model Checking Competition (HWMCC) [5]. 

To tackle the problem of generating a proof certificate for k-induction of the 
safety of a circuit C, as is the main goal of this paper, we extend it to a larger 
circuit C’ with additional “book-keeping” behaviours [1] for which we can show 
the same property by using standard induction. To ensure that the resulting 
extended circuit C’ preserves the original property, we provide a formalization 
through a combinational simulation relation between two circuits, which needs to 
be formally verified by a certifier. One important aspect of our design principles 
is to keep the complexity of the required certification procedure low, in other 
words, to be done via pure SAT solver checks or by solving a QBF with at most 
one quantifier alternation. This leads to a more complicated non-standard design 
of the certification approach, the details of which will be described in Sect. 4. 

From a practical perspective, under combinational simulation defined below 
in Definition 7, we require that the transition functions on the “common” parts 
of the two circuits are equivalent. For the new latches, the transition functions 
are always satisfiable (as they are functions), and thus we need no constraints 
on them. As second condition we require that if the safety property P’ holds in 
the extended circuit, then the property P holds in the original circuit. The last 
condition we need to check is that all the new latches of the extended circuit can 
be initialised with some values whenever the original circuit can be initialised 
and using the same values for initialising the common latches. In other words, 
for all initialisations of the original circuit there is at least one initialisation of 
the extended circuit with the same values for common latches. 
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Under these conditions Theorem 1 in Sect. 3 shows that if the extended cir- 
cuit (in this sense) combinationally simulates the original one and the extended 
circuit is safe then the original circuit is safe as well. 

With some abuse of notation, we use JL in a Quantified Boolean Formula 
(QBF) to denote existential quantification over variables in L. As usual, free 
variables are (implicitly) assumed to be quantified universally. 


Definition 7 (Combinational simulation). Given circuits C = (I, L, R, F, 
P) and C’ = (I', L', R', F’, P’) where C” combinationally extends C, we say that 
C’ combinationally simulates C, if the following holds: 


1. fal, L) = ff, L’) forle L, “transition” 
2. P'(I,L') = P(I, L), and “property” 
3. R(L) > A(L'\L)R'(L’). “reset” 


In later context when verifying the combinational simulation relation between 
two circuits, we refer to Definition 7.1 as the transition check, Definition 7.2 as 
the property check, and Definition 7.3 as the reset check. 


3 Model Checking 


In this section, we consider model checking via k-induction. The model checking 
problem for safety properties concerns determining whether, given a circuit with 
a property P, it is the case that P holds in all reachable states, i.e., the initialised 
unrolling of a circuit of any arbitrary length is safe. 


Definition 8 (Safe circuit). Let Um be the unrolling of circuit C, C is safe 
if Um AR(Lo)=> A P(L, Li) holds for allm €N. 
i€[0,m] 


Based on the above definition, we say the property P “holds” in C if the 
circuit is safe with respect to P. 


Theorem 1. Assume that the circuit C’ combinationally simulates the cir- 
cuit C. If C’ is safe, then C is safe. 


Proof. We do a proof by contradiction. Let m € N be a bound for which the 
claim does not hold. Thus the unrolling of length m of C” is safe w.r.t. P, and 
therefore U}, A R'(Lo) => A P'(I;, £1) holds. To obtain the contradiction we 
ic [0,m] 
assume there is a satisfying assignment s of UnAR(Lo)A7 A PUG, Li), which 
i€[0,m] 
would make C not to be safe. Thus R(Lo) needs to be satisfiable. Now the reset 
check of Definition 7.3 implies that R'(Lọ) A R(Lo) is guaranteed to be satisfi- 
able with Lo being a subset of Lb. Moreover, by Definition 7.1, the unrolling U/, 
of C” is also satisfiable with the transition function F applied on the projected 
(“common”) component on both circuits. Also for the new latches the fact that 
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we use a transition function for them, they are also satisfiable (transition func- 
tions guarantee that there is always a successor state for all states). Therefore 
the initialised unrolling R’(L5) A U% is satisfiable. Furthermore, by our assump- 
tion, A P’(Ij, Li) holds. By Definition 7.1 and Definition 7.3, the projected 
i€[0,m] 
latches of C” stay the same as L; for all i € [0, m], and thus by Definition 7.2 we 
have that A P(ii, Li) holds. 
i€[0,m] 


As usual, we call a formula ¢ to be an inductive invariant ¢ of a circuit C if o 
satisfies the following conditions: (1) R(L) > (1, L), (2) (I, L) > PU, L), and 
(3) U1 A bo, Lo) > o(1, £1). As a generalisation, k-induction looks at k steps 
of evolution rather than 1 step by assuming the property holds in k consecutive 
timestamps at the induction step. 


Definition 9 (k-inductive). Given a circuit C with a property P, define the 

formula S, = N P(L, Li). Then P is called k-inductive in C if and only if 
i€ [0,k) 

the following two conditions hold: 


1. Up-1 A R(Lo) = Sk, and “initiation” 
2. Uk A Sk => P(Ik, Lx). “consecution” 


The first condition Definition 9.1 in this definition is called initiation check, 
also bounded model checking check or simply BMC check on the initialised 
unrolling of length k—1, whereas the second condition Definition 9.2 is referred to 
as the consecution check for the unrolling of k. Note that a l-inductive invariant 
is equivalent to an inductive invariant when ¢(I, L) = P(I, L). 


1 MODULE main 14 b0 := (c0 = FALSE); 

2 VAR 15 b1 := (cl = TRUE) & b0; 

3 r : boolean; 16 b2 := (c2 = TRUE) & 61; 

4 c0 : boolean; 17 ASSIGN 

5 cl : boolean; 18 init(cl) := FALSE; 

6 c2 : boolean; 19 init(cO) := FALSE; 

7 DEFINE 20 init(cl) := FALSE; 

8 a0 := TRUE; 21 next(c0) := Ir & !m2 & (c0 != a0); 
9 al := c0 & a0; 22 next(cl) := !r & Im2 & (cl != al); 
10 a2 := cl & al; 23 next(c2) := Ir & !m2 & (c2 != a2); 
11 m0 := (c0 = FALSE); 24 SPEC 

12 m1 := (cl = FALSE) & m0; 25 AG !b2 


= 
w 


m2 := (c2 = TRUE) & ml; 


Fig. 1. The SMV code for the Counter example. 


Example 1. We consider a simple example of an N-bit counter, where the counter 
counts up to a modulo bound m, then it resets to zero. There is also a reset signal 
which works as an enabler, such that when the signal is set to 1, the counter 
is forced to reset. The property checks whether the counter value reaches b. 
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Fig. 2. The transition diagram of the Counter example. The initial state is “000” 
(colored yellow). In the (gray) “bad” State “110” the property does not hold. (Color 
figure online) 


Here the exact modulo check makes the model checking problem k-inductive 
(k = b— m + 1). More precisely, for N = 3, the formal description of a 3-bit 
counter is given in the SMV language in Fig. 1, where m = 5,b = 6. (Note that 
our example can be easily extended to integers too.) The state diagram of this 
system is shown in Fig. 2. The input values are specified with the transition 
relations. This model is 2-inductive. 


4 Certification 


In our suggested approach, certifying model checking results concerns finding and 
checking an inductive invariant which implies the original specification, which 
in our case, is the safety property P. To tackle the problem of certifying k- 
induction-based model checking for any given circuit, in this section, we redirect 
the problem to generating a simple inductive invariant from a k-witness circuit, 
in which the original circuit is combinationally simulated. 

We start by defining the formalism of a k-witness circuit. The main idea is 
to record the previous k — 1 states and inputs of the circuit observed during the 
execution, “flattening” the k-induction procedure back to normal induction of 
a larger circuit. As a result, the size of the circuit increases by a factor of k, 
where k is the constant used in the k-induction scheme. The k-witness circuit 
has k local components of inputs and latches. Each component can be seen as 
representing a state in the original circuit. Whenever a new state is saved, the 
oldest one is discarded. 

One of the key technical challenges is the proper initialisation of the k-witness 
circuit. We use an additional k initialisation bits for indicating which components 
of the circuit have been initialised. This helps accomplishing the combinational 
simulation relation later. We say a component is initialised if its initialisation 
bit is T. At initialisation, the k-witness circuit can be either fully or partially 
initialised. Figure 3 displays three ways of initialising the components. In the case 
of full initialisation, the circuit pre-computes k steps of the original circuit as 
the initial state of the k-witness circuit. Thus intuitively in the full initialisation 
case the initial state of the k-witness circuit encodes the states reachable in the 
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k-step initialised BMC unrolling of the original circuit. In partial initialisation 
scenarios the circuit instead pre-computes an initialised BMC unrolling for fewer 
steps, where some components are left uninitialised. In the final case where there 
are no pre-computed steps, the circuit simply runs from an original initial state, 
leaving all the other components fully uninitialised. 

In the definitions below, we use the superscript of i in L* to denote a copy 
of latches L in the spacial direction, such that we introduce a set of new latch 
variables for every L*, where l? € L* is the corresponding copy of | € L, and 
similarly for inputs. We refer to l as some latch in Lf, where i is the index 
of a latch set Lt. The formal definition of k-witness circuit is given below. We 
continue to use subscripts for the temporal direction. 


Definition 10 (k-witness circuit). Given a circuit C = (I, L, R, F, P), and 


", the k-witness circuit| C’=(I’, L’, R’, F’, P’) 


k eN? 
P 
2. | L’ 


of C is defined as follows: 


=I. For simplicity we also refer to I' as X*~}. 
= X} Ue U XET U LOU- U LELU B, such that, 


(a) X* is a copy of the original inputs, for all i € [0, k — 2]. 
(b) L* is a copy of the original latches, for all i € [0, k — 1]. 
(c) B= {b°,...,b*-1} is the set of initialisation bits. 
3. The reset function | R’|= {ri(L') | LE L’} is defined as follows: 


(a) 
(b) 
(c) 
(a) 


(g) 


For x € X? U... U XF? rag. 


Fori € [1,k — 1), ut = R(LŻ) V utt, and uk} 


For 1€ 19, rf = ite(u!,l,ri(L)). 
For i € [1,k), ru = ite(ué, 1°, fu (X571, L*-})). 


For i € [1,k — 1), ry = 671 v (R(L*) A su"). 


4. |P 


={f/U',L') | lE L’} is defined as follows: 


(a) 
(b) 
(c) 
(a) 


Fori € [0,k— 1), f(T, L’) = rt. 
For Le LE, fil’, L) = fi(X*1, LED), 
Fori e [0,k— 1), fi, LN) =. 


SRL, 


Fori € [0,k— 1), fy’, L’) = btt, and fjr (T, L) = 01. 
5. The property| P’| is defined as P'(T',L') = N pl’, L’) such that: 


— ielo] 
For i € [0,k— 1), hi = (Lit! ~ F(XŻ, LŻ)). 


pol, L) = A (= btt), 
i€[0,k—1) 

Ad’, L’) = A >k) 
i€[0,k—1) 

pL’) = A (b> P(X", L) 
i€[0,k) 

pa(I’,L')= A (bt Ab!) > R(L')). 
i€[1,k) 

pa(I', L’) = pk-1 
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In Definition 10 we list five parts of the k-witness circuit. For clarity, we 
explain each part in more details in the following text: 


1. The set of inputs is identical to that of the original circuit. 

2. The set of latches consists of the original latches, k initialisation bits, and 
an additional k — 1 copies of inputs and latches which are introduced to save 
observations of previous states. 

3. The reset function is defined to allow non-deterministic initialisation (see 
Fig. 3), where we use helper variables u’ for a more compact encoding. The 
formula u’ is satisfied whenever a component younger than the ith has the 
same reset value as the original circuit. The reset functions of the X’ latches 
(for i < k — 1) ensure they are initialised in a non-deterministic fashion. As 
for the initialisation bits B, their reset values are deterministic, depending 
on the initialisation status of the components. 

4. The transition function of the (k — 1)*" copy of latches is identical to the 
original transition function, while every older component simply saves the 
value of its one timestep younger component. 

5. Finally, the property is composed of five sub-properties, where h‘ is satisfied 
whenever the two adjacent components follow the original transition relation. 


Figure 4 illustrates a comparison of variable structures of the original circuit 
and its k-witness (this also suggests their combinational extension relation). The 
area marked yellow (left box and top right box on the right) consists of the same 
set of variables. We consider each pair (Xt, L’) as a component in the circuit 
and refer to (X*~1, L71) as the most recent component (youngest copy), and 
(X°, L?) as the oldest component (copy). Additionally we also refer to the inputs 
I' as X*—! for convenience. 

The property P’ is comprised of five sub-properties. The monotonicity prop- 
erty po expresses the monotonic nature of the initialisation bits. Intuitively, if 
a component is initialised, all components younger than it should also be ini- 
tialised. The transition property pı expresses the property that every initialised 
component has to follow the transition relation in the original circuit. Of partic- 
ular interest is the k-safety property p2, which says the original property P needs 
to be satisfied in every initialised component. The reset property p3 expresses 
the property that in the case of partial initialisation, the oldest initialised com- 
ponent needs to satisfy the original reset function. Finally, p4 expresses that at 
least the youngest component should have the initialisation bit set. 

We now show the combinational simulation relation between the original 
circuit and its k-witness circuit. 


Theorem 2. The circuit C is combinationally simulated by its k-witness circuit. 


Proof. By the construction in Definition 10, the inputs stay the same in the 
k-witness circuit C’, and the new latches are a superset of the original ones (the 
youngest component in C’). Thus by Definition 6, C’ combinationally extends C. 
Based on Definition 10.4, the transition function of L*~! is identical to the orig- 
inal one, which satisfies Definition 7.1. In the new property, p4 and pə together 
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Sk-1 


so 


so ee Si 


Fig. 3. The diagram shows three possible initial states of C’. Here (1) illustrates 1- 
initialisation, (2) is i-initialisation, and (3) full initialisation. The grey area are the 
uninitialised components (the “don’t care”s). 


a? v peo} \B 


Fig. 4. The structure of input and latch variables in C and C’. (Color figure online) 


imply P(X*~!, L*~1). In other words, the original property holds in the most 
recent component. This then satisfies Definition 7.2. By Definition 10, for every 
satisfiable assignment of R(L), the same assignment satisfies R’(L) on the com- 
mon latches (the youngest component). For all the new latches we observe the 
following. Because the reset of the newest component is satisfiable with the same 
assignment as in the original circuit, we can see that u*—! is true in the k-witness 
circuit and therefore all other u’ are also true. Therefore all the ite-statements 
of the reset definition become trivially satisfiable. To complete the argument, 
by Definition 10.3, all the initialisation bits can be now set to L except b*—1 
which can be set to T. A satisfying assignment of R’(L’) can thus be directly 
constructed (deterministically in polynomial time) from any satisfying assign- 
ment of R(L). This implies the reset condition of Definition 7.3 holds. (Sidenote: 
This implies that the QBF check needed in the combinational simulation rela- 
tion could potentially be solved easily in practice for these k-witness circuits.) 
Therefore C” combinationally simulates C. o 


In the following, we present the main result of this paper on the relationship 
between a circuit C and its k-witness circuit C’ in terms of k-induction. 


Theorem 3. Given a circuit C, a fixed k € N*, and its k-witness circuit C’, P 
is k-inductive in C iff P’ is 1-inductive in C’. 
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Proof. We consider the two k-inductive checks in Definition 9 for both directions. 
In Theorem 4 we show that the BMC check (of the initialised unrolling of length 
k — 1) in C passes, if and only if the same check (of the initialised unrolling of 
length 0) in C’ also passes. In Theorem 5 we prove that if the consecution check 
of C” passes, then the consecution check also passes in C. Lastly, Theorem 6 
shows that if P is k-inductive in C, then the consecution check of P’ using the 
unrolling of length 1 passes in C’. By combining them together, we conclude P 
is k-inductive in C iff P’ is 1-inductive in ©”. 


For the BMC check in the two circuits, we need to analyse three separate cases 
as shown in Fig. 3, which correspond to Lemmas 2, 3, and 4, respectively. But before 
this we need a technical Lemma 1 on the initialisation bits. In the following context, 
we consider a given circuit C, and its k-witness circuit C’ with a fixed k. 


Lemma 1. For the initialised unrolling of length 0 of the k-witness circuit C’, 
the reset values of the initialisation bits Bo are deterministic and depend only 
on the component with the largest index i € [0, k) for which R(L)) is satisfied. 


Proof. Firstly, we define S = {i | R(L})}, based on which we consider two cases. 

(1). By Definition 10.3(c), if suj, then 0 € S. In this case, b8 = T by 
Definition 10.3(f), and by Definition 10.3(e)(g), bj, ..., bAT" are all set to T. 

(2). Otherwise we consider ud, where S contains at least some i € [1, k). 
Let m be the maximum index in S, and m # 0. Since R(Lj'), ug’ is satisfied, 
so are ug™t,... uj, while ugt, „n URTI are not. In Definition 10.3(g), for all 
i € S, R(L}) A su‘ is only satisfied when i = m, thus b? = T. Therefore 
bi = T for all į € [m + 1, k). By Definition 10.3(f), b8 = L, therefore for all 
i € [1,m), b} = L. 


Initialisation bits are indicators for the initialisation status of the k-witness 
circuit. We observe that the sub-properties po,...,p3 of the k-witness circuit 
trivially hold for uninitialised components (i.e., those for which the initialisation 
bit is 0), while p4 solely depends on b*~?, 


Lemma 2. If the initialised unrolling of length k —1 of the original circuit C is 
safe, the initialised unrolling of length 0 of the k-witness circuit C’ is also safe, 
in the case of 1-initialisation. 


Proof. Assume Ux_1AR(Lo) > A P(G, Li) such that the initialised unrolling 
i€[0,k) 
of C is safe. In the case of 1-initialisation, we consider R’(L}) A R(LK~') as the 
initialised unrolling of C”, as Uj is trivial. By Lemma 1 and Definition 10.3, for 
the initialisation bits, only ae is set to T and the rest remain L. The values of 
Bo then satisfy po(1j, Lo), pı (16, Lo), pa(Ib, Lo) trivially. Every satisfying assign- 
ment of R’'(L4) \ R(L§~*) satisfies R(Lo) with Lo = LK7*, Ip = XG7. Similar to 
our argument in Theorem 1, Uk-1^R(Lo) is then also satisfiable. By our assump- 
tion, Px. D] is thus satisfied. The premise of p2(Ij, LO) is only satisfied 
for ie and with the same assignment satisfying P(XH, LEI), p2(I6, Lo) is 
also satisfied. Lastly, the premise of p3(Iġ, Lọ) is only satisfied for =k? A ee 
and since R(L§~'), p3(Ih, Lh) is satisfied. Therefore we have P’(I/, Lh). 
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Lemma 3. Jf the initialised unrolling of length k —1 of the original circuit C is 
safe, the initialised unrolling of length 0 of the k-witness circuit C” is also safe, 
in the case of i-initialisation. 


Proof. Firstly, we assume Uz_; A R(L) > A P(L, Li). In the case of i- 
i€[0,k) 
initialisation, we consider R’(L) A R(Li”) A su™~! as the initialised unrolling 
of C’, where m € [1,k — 1) is the largest index for which R(L{G’) is satis- 
fied. As we showed in Lemma 1, 0, ee i are set to T while b9, ier? are 
L. Following Definition 10.3, Li ~ F(X, LỌ ') for all i € (m,k), while 
all components older than m are uninitialised. Every satisfying assignment of 
R'(Lp) A R(L?) A su™! also satisfies A (Liga ~ F(L;, Li)) A R(Lo) 
i€[0,k—m—1) 
with I;-m = X4, Li-m = L$ for all i € [m,k). In the rest of the proof, we fix 
the assignment satisfying R’(Lo) A R(Li) A ~u™™!. Similar to our argument 
in Theorem 1, U,_1 A R(Lo) is satisfiable with our fixed assignment. By our 
assumption, A P(X, Lb) is then satisfied. We now consider P’(Ij, Lj). As 
i€[m,k) 
the premise of p2(JG, Lọ) is only satisfied for b9’,..., be p2(Iġ, Lo) is satisfied. 
Similarly for the transition property, with Li, ~ F(X}, Ba) for alli € (m, k), 
pı(Ib, Lo) is satisfied. Given the values of Bo, the monotonicity property is sat- 
isfied. In addition, p4(Iġ, LG) is also satisfied as 5 = T. Finally, the premise 
of pa (I6, Lọ) is only satisfied for = A bj’, and as we already have R(Lj’), p3 
is satisfied. 


Lemma 4. If the initialised unrolling of length k —1 of the original circuit C is 
safe, the initialised unrolling of length 0 of the k-witness circuit C” is also safe, 
in the case of full initialisation. 


Proof. We assume Up_-1 A R(L) > A Pi, Li) for the original circuit. Since 
i€[0,k) 

we consider full initialisation, R' (L4) A R(L}) A sud is the initialised unrolling 

of C’. Following Definition 10.3, Li ~ F(Xj-', LẸ +) for all i € [1,k). Every 

satisfying assignment of R'(Lh) A R(L}) A nud satisfies Up_1 A R(Lo) with 

I, = XÅ, Li = Li, for all i € [0, k). The rest of the proof follows the same logic 

as in Lemma 3. 


Lemma 5. If the BMC check for the unrolling of length k — 1 of the original 
circuit C passes, then the BMC check for the unrolling of length 0 of the k-witness 
circuit C’ also passes. 


Proof. Based on Definition 10.3, we consider the BMC check for all possible 
initial states. Lemma 2, 3 and 4 cover the case-split over all initial states of C” 
based on whether each component satisfies the original reset function R(Lj,) or 
not. We show that the BMC check of C’ passes under the same assumption for 
three initialisation cases respectively. In particular, our construction in Defini- 
tion 10.3 does not allow all components to be uninitialised, in which case R’(L}) 
becomes unsatisfiable (more specifically, R’(L}) is unsatisfiable). We conclude 
the BMC check of the initialised unrolling of length 0 passes in C”. 
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We proceed to prove the opposite direction of the BMC check for C and C” 
by considering the reset status in the k-witness circuit. 


Lemma 6. If the BMC check for the unrolling of length 0 of the k-witness circuit 
C’ passes, then the BMC check for the unrolling of length k — 1 of the original 
circuit C also passes. 


Proof. We assume the BMC check passes in the k-witness circuit, R’(L5) => 
P’(I§, Lo). We do a proof by contradiction by assuming the BMC check of length 
k — 1 fails for the original circuit. Thus there exists a satisfying assignment s 
of Uk-1 A R(Lo) ^= A P(L, Li). We can construct a satisfying assignment of 
i€0,k) 
R’(L6) as follows. Let a € [0, k) be some index for which ~P(Iq, La) is satisfied. 
Let m € [0,a] be the index for which R(Lm) ^= V R(L;) is satisfied. Let 
i€(m,a] 
XET = Iai, LET = Lai, b571 = T for all i € [0,a — m]. The rest of 
initialisation bits of Bo are set to L. By Definition 2, we have Li+ı ~ F(L, Li) 
for all i € [m,a), which satisfies Definition 10.3(d). As our construction satisfies 
R'(L6), by our assumption, P'(Iġ, Lọ) is satisfied. By Theorem 2, P(Ig, La) is 
satisfied. Since we assume s satisfies ~P (Ia, La), we have reached a contradiction. 


As an immediate consequence of Lemma 5 and 6, the BMC check of C passes 
iff the same check passes in C’. We record the result in the following Theorem. 


Theorem 4. The BMC check for the unrolling of length 0 of the k-witness cir- 
cuit C” passes, if and only if the BMC check for the unrolling of length k —1 of 
the original circuit C passes. 


F F F F F 
Cc S Si+k—1 Sitk 
F 
(om = 
a Si tee Sitk-1 Si+1 e Si+k—1 Si+k 
1 1 1 1 1 


Fig. 5. The diagram shows the consecution check in C and C”. 


We show in Fig. 5 an illustration of the consecution check in both circuits. 


Theorem 5. If the consecution check for the unrolling of length 1 of the k- 
witness circuit C” passes, then the consecution check for the unrolling of length 
k of the original circuit C passes too. 
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Proof. We assume Uj A P(I§, Lọ) = P(1{, £4) holds. We then do a proof by 
contradiction by assuming that the consecution check for the original circuit fails. 
Thus there is a satisfying assignment s of the formula Uk A A P(dj, Li) A 
i€[0,k) 
=P(Iņķ, Lk). Based on s, we have a satisfying assignment for Uj A P’(1G, Lo) 
as follows. Let X4 = I;, Li, = Li, and bh = T for all i € [0, k). Let Xit = 
I;,Li-* = L, bt = T for all i € [1,k]. We now show this satisfies Li ~ 
F'(I}, Lh). Since Xi-+ = I, = Xj and Li + = L; = Li, for all i € [1,k), 
Definition 10.4(a) and Definition 10.4(c) are satisfied. Since s satisfies Up, by 
Definition 2, it satisfies Ly ~ F(Ip—1, Lp_1). With XET = Ip_1, LET! = Lpi, 
and Li~' = Ly, we have L{~' ~ F(X§7', LE-t), and thus Definition 10.4(b). 
As for the initialisation bits, since all of them are set to T in both Bo and Py, 
Definition 10.4(d) is satisfied. As a result, Uj is satisfied, and we continue to 
show the same assignment satisfies P’(I}, Lọ). Similar to our proof in Lemma 3, 
the values of Bo satisfy po(Iġ, Lo) and p4(Iġ, Lo) immediately. As the premiss 
of p3(1G, Lọ) is unsatisfiable, p3(Ij, Lọ) trivially holds. Since U;, is satisfied, by 
Definition 2, we have Li41 ~ F (Ji, Li) which satisfies hj for all i € [0, k— 1), thus 
also pı(Iġ, Lọ). Lastly, since P(I;, Li) is satisfied for all i € [0, k), the original 
property is satisfied in every component P(X%, Li), resulting in the satisfaction 
of p2(Ij, Lọ). By our initial assumption, P’(I{,I{) is satisfied. By Theorem 2, 
we have P(X*~!, L~*), thus P(Ip, Lp). We reach a contradiction here. We can 
therefore conclude the consecution check of the original circuit passes. 


Lemma 7. Ifthe safety property P is k-inductive in the original circuit C, the 
consecution check of the unrolling of length 1 passes in the k-witness circuit C’, 
given that Lo is partially initialised. 


Proof. Assume P is k-inductive in C. Let Uj be the unrolling of C’, and 
m € [1,k) is some index such that b8,...,08°~! are set to L, while b, ..., 047+ 
are set to T (as we consider partial initialisation here). We do a proof by con- 
tradiction, and assume there is a satisfying assignment s of the negation of 
the consecution check formula Uj A P’(Ij, Lo) A ~P’(Ij, L1). Since we assume 
P' (Ij, Lg), it implies R(L7"), based on p3(Ij, Lh). We also have Lott ~ F(Xé, Lh) 
for i € [m, k — 1), based on p; (JG, Lo). Furthermore, Uj implies Li ~ F' (16, Lọ), 
and by Definition 10.4, i ~ F( Xt, Le) Therefore the same assignment 
satisfies U,-1 A R(Lo) where Ii-m = X$,Li-m = Li, for all i € [m,k), and 
ihm = 1, Iam = i, By our assumption that the BMC check passes in C, 
we have P(X, Li) for all i € [m,k) and P(Ij, L71). 

We can then proceed to prove P’(Ij, L1) is indeed satisfied. Similar to our 
proof in Theorem 5, based on Definition 10.4, b = T for all i € [m,k) while 
bi = L for all i € [0, m). Additionally, X} = X4tt, Li = Litt for i € [0,m — 1). 
The rest of the proof follows the same logic as Theorem 5 for showing P' (I4, L4) is 
satisfied. We then reach a contradiction here, and thus conclude the consecution 
check for C” passes in this case. 


Lemma 8. If the consecution check for the unrolling of length k passes in the 
original circuit C, the consecution check for the unrolling of length 1 passes in 
the k-witness circuit C’, given that Lo is fully initialised. 
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Proof. Let Uj be the unrolling of C’ with rn i a all set to T. Similar 
to Lemma 7, we do a proof by contradiction, and assume there is a satis- 
fying assignment s of Ui A P’(1j,L6) A -~P’(Ij, L1). By the transition prop- 
erty pı(Ib, Lo), the components follow the transition function F, such that 
Lit! ~ F(Xį, LÅ) for all i € [0, k — 1). Similar to our argument in Lemma 7, U! 
implies LẸ" ~ F(I4, L§~*). We also have A P(Xį, Lb) based on pa(T4, Lh) 
i€[0,k) 
and the values of Bo. The same assignment thus satisfies Up ^ A P(L;, Li) 
i€(0,k) 

where L; = Li, A I; = X4 for all i € [0, k) and Iy = I, Ly = L*~'. Based on our 
assumption that the consecution of C passes, we have P(I{, L'~'). Following 
the same reasoning in Lemma 7, after one transition, b = T for all i € [0, k), 
and Xi = X4, Li = Li"? for i € [0,k — 1). 

We can now show P’(I}, L4) is satisfied. The k-safety property po(I}, L4) is 
satisfied as we have proved p(X?,L‘) for all i € [0, k). The transition property 
pi(Ij, Lh) is preserved, as Uy, is satisfied which implies Litt ~ F(Xi, Li). Based 
on the values of By, polli, £4), pa (I4, L1), pa(1{, £4) are satisfied immediately. 
We conclude the P’(I{, L1) is satisfied thus we reach a contradiction. Therefore 
the consecution check for C’ passes in this case. 


Theorem 6. If both k-induction checks pass in the original circuit C, then the 
consecution check of the unrolling of length 1 in the k-witness circuit C’ passes. 


Proof. First of all, we assume both checks pass in C. We then do a proof by con- 
tradiction by assuming there is a satisfying assignment s for the negation of the 
consecution check Uj A P’(I§, £5) \7P’ (1, £4). Since s satisfies Uj A P' (I6, Lọ), 
we consider two separate cases where the property P’(Jj, Lọ) is satisfied: full 
initialisation or partial initialisation. Note when all b9, nbg are set to L, 
P' (I0, Lo) is not satisfied. Therefore applying Lemma 8 and Lemma 7 together, 
we conclude if both k-induction checks pass in C, the consecution check of the 
unrolling of length 1 in the k-witness circuit also passes. 


We briefly discuss why the k-witness circuit is linear in the size of the original 
circuit, and the value k. If we consider the circuit size in terms of gate numbers, 
the number of latches and inputs increase by a factor of approximately k. The 
transition functions are copied k — 1 times, i.e., k — 2 times for reset in Defini- 
tion 10.3(d), and once more in 10.4(b), while the k — 2 copies in the property 
part 10.5(a) have the same arguments and can be shared. For the reset predi- 
cates, defining R(L*) is linear in the number of the latches, while u’ is linear in 
k. We apply the same logic when defining the property, therefore we conclude 
our construction is linear in the size of the circuit and k. 


5 Implementation 


Based on our new construction we implemented CERTIFAIGER [12], which works 
as a tool suite comprised of multiple components as shown in Fig. 6. The tool 
takes as inputs a circuit which contains a safety property given in AIGER for- 
mat [7] and a value k provided by a k-induction-based model checker which 
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outputs a positive model checking result. Upon invocation, internally the inputs 
are passed on to the k-witness generator that parses the AIGER file and gener- 
ates a k-witness circuit as defined in Definition 10. The new safety property is a 
simple inductive invariant (to be verified) for the k-witness circuit. We extended 
the reset logic definition of the existing AIGER format defined by the authors 
of [7] to enable reset functions, whereas all previous AIGER versions only allow 
reset values to be 0, 1, or uninitialised. The k-witness circuit from the k-witness 
generator is given in this extended AIGER format. 


CERTIFAIGER 


l i 
; k-witness ; 
: k = eu I 
: i generator : 
$ C 1 1 
: pease | combinational inductive : 
i simulation invariant ' 
i checker checker l 
l 1 
i 1 
| a i x e Gee A i le s oi 
1 : 1 
i Preset Pprop Ptrans Pceonsist Pconsec Pinit : ! 
I) e aa | Te Rae, 8 SA OR a eee ] 
[i 1 
| Ooo 
1 1 
i QBF SAT SAT SAT SAT SAT i 
! solver solver solver solver solver solver ! 
T/F s/u s/u S/U s/u S/U 


Fig. 6. The architecture of CERTIFAIGER. C is the input circuit in AIGER format and 
k is the value given by a k-induction-based model checker. The final outputs of the 
SAT solvers are given in the form of S/U, for satisfiable or unsatisfiable. The QBF 
solver outputs true or false (T/F) as the result. 


To verify the inductive invariant (I, L), as discussed in Sect. 3, our certifier 
generates three conditions. (Note that here we are only looking at extended 
circuits, therefore we use L instead of L’.) 


Condition Formula The inductive invariant ... 
“initiation” | R(L) > oU,L) ... must hold at all initial states 
“consistency” | @(I, L) > PU, L) ... must hold at all good states 
“consecution” | U1 A (Io, Lo) = (Á, £1) |... is preserved during the transition 


In our implementation, the latch variables used in the inductive invariant 
are updated with their next state literals after each transition. The consistency 
condition is rather trivial here, as the inductive invariant is exactly the property 
in the k-witness circuit, although this is only specific to our case. 
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Our certifier generates for each of the three conditions a (combinational) 
AIGER circuit which is then checked by a SAT solver. In our implementation, 
we used the SAT solver Kissat [6] for checking validity of the formulas after they 
have been converted to CNF by invoking AIGTOCNF from the AIGER library. 

Furthermore, we implemented the combinational simulation checker for veri- 
fying the combinational simulation relation described in Definition 7. The checker 
takes as inputs the original circuit and the k-witness circuit. It generates two 
AIGER files for the transition check and the property check, as well as a 
QAIGER file for the reset check, as defined in Definition 7. Similar to the induc- 
tive invariant checker, the AIGER files are then converted to CNFs and verified 
by Kissat. QAIGER is a standard format used in QBF Competitions. In our 
experiments the formula is verified with the QBF solver QuAbS [35]. 

The tool CERTIFAIGER returns “SUCCESS” as a result if all six formulas hold, 
meaning that the circuit C” combinationally simulates C and C” is safe by the 
1-induction proof. Thus by Theorem 1 the original circuit C is also safe. Note 
that this result holds regardless of how C” is constructed. 

Given a scenario where we would want to place trust on the correctness 
of the extended circuit mapping inside the k-witness generator (to trust that 
the k-witness circuit construction of Definition 10 is correct and the program 
implementing it is also provably correct), all three combinational simulation 
checks (one QBF and two SAT checks) could be skipped in the certification 
procedure. 

Intuitively, given a faulty generation of the k-witness circuit C’, the error 
would either be caught by the combinational simulation check (due to an erro- 
neous under-approximation of the set of reachable states) or the inductive invari- 
ant check (due to an erroneous over-approximation of the set of reachable states). 
Furthermore, we have also done a sanity check of certification on failure, where 
the model checking results are falsified by CERTIFAIGER. An incorrect value of k 
is detected by a negative result of Yeonsec, Whereas Yinit does not hold in cases 
where an initial state is a bad state. 


6 Experiments 


As described in previous sections, the complexity of extending the original cir- 
cuit into k-witness is linear in the size of the circuit, and the inductive depth. 
To evaluate the practicality of our tool, we now report the experimental results 
obtained by evaluating CERTIFAIGER against a number of widely used bench- 
marks. The benchmarks were first run on the open source k-induction-based 
model checker McAiger [3], which was modified to give the values of k explicitly. 
All experiments were carried out on an Intel® Core™ i9-9900 CPU 3.60 GHz 
computer with 32 GB RAM running Manjaro with kernel version 5.4.72-1. 

We start with the TIP suite benchmarks which were originally used in [18]. 
The benchmarks were converted from .sMv to AIGER by invoking SMVTOAIG 
from the AIGER library. Table 1 reports the certification results obtained, where 
the file names are associated by the origin of the problems explained in [18]. The 
table displays the following information in each column: 
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Fig. 7. The time (a) and file size (b) comparison results for the TIP suite. The bench- 
mark names are shown on the x-axis. Average values are shown as the blue horizontal 
line in each plot. The y-axis of (a) displays the time ratio of total certification time 
and model checking time. The y-axis of (b) shows the expansion factor indicating the 
comparison of circuit sizes (k-witness circuit v.s the original). (Color figure online) 


the name of the AIGER file, 

the verification time on McAiger, 

the size of the original circuit, in terms of the number of gates (thousands), 
the k-inductive value k given by the model checker, 

the size of the k-witness circuit, 

the time taken on the k-witness generator, and 

the size and solving time (seconds) of each condition. 


NO PwN rE 


Note here we selected benchmarks that gave a positive model checking result, 
only in which case the original property is k-inductive. Moreover, three instances 
that require simple paths constraints (also called loopFree constraints in [34]) 
were ruled out. Handling these constraints is an interesting area for future study. 
We retrieved the inductive depths k from the model checker McAiger, and com- 
pared with the results in [18] to ensure the values are identical. As shown in 
Table 1, the values of k vary between 4 and 96. The SAT solver was able to 
handle the proof checking without experiencing time-outs. We observe that the 
k-witness circuit generation time is rather small, compared with the model check- 
ing time as well as the proof checking time. In the proof checking stage, Table 1 
suggests that the SAT-solving time for Yconsec is much higher than the rest of 
the formulas. This is as expected, as the formula Yeonsec is in general more com- 
plicated than the rest, and appears to be the most difficult formula to solve. In 
addition, QBF solving times are also worth-noting: in a few cases QBF solving 
time is longer than for other formulas, however, in most cases, it is rather small. 
To compare certification time with model checking time, we plotted the results 
in Fig. 7a, where the y-axis shows the ratio of certification and model checking. 
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Table 1. Experimental results for the TIP suite. 


Pinit Peonsist YPconsec trans Pprop Preset 
Name tol k) go'l Uv Ho t t 7 t # t # t # t 
c.periodic 6.01}1.56]96]215.79|0.06} 242.91 5.62|215.79 0.06|424.80 57.54|217.42 0.15|217.28 0.06|216.06 85.25 
n.guidance;| 0.08/1.91)10} 31.89/0.01| 38.39 0.21| 31.89 0.01| 62.16 3.34| 33.97 0.12| 33.63 0.01| 32.58 1.24 
n.guidance7| 8.57/2.00|27| 91.22/0.03| 109.35 3.58|91.216 0.02|177.90 18.24| 93.39 0.12| 93.04 0.02| 91.90 25.6 
n.tcasp2 0.08}3.02] 6| 32.54]0.01) 39.76 0.16]32.542 0.01) 63.28 2.70) 35.92 0.26] 35.23 0.02} 33.92 1.8 
n.tcasp3 0.09/2.98| 5| 24.23/0.01]} 32.47 0.13) 26.56 0.01} 51.63 1.80] 29.90 0.26} 29.21 0.02] 27.94 1.04 
v.prodcellj2] 7.60/2.91)29|121.07|0.04] 133.59 2.66]121.07 0.03}239.01 65.76/124.19 0.12|123.88 0.03]121.69 8.6 
v.prodcellı3| 0.10/2.91) 8| 32.41/0.01| 35.78 0.20] 32.41 0.01) 63.97 3.55} 35.52 0.12) 35.21 0.01] 33.03 0.2 
v.prodcelly4} 0.81/2.91]16} 66.03)0.02] 72.88 0.73] 66.03 0.02/130.34 17.30] 69.14 0.12] 68.83 0.02] 66.65 1.48 
v.prodcellis| 2.94/2.91)23] 95.60/0.03| 105.51 2.05} 95.60 0.03}188.74 35.87] 98.72 0.12] 98.41 0.02} 96.23 4.34 
v.prodcellie] 0.07/2.91) 5| 19.85]0.01} 21.91 0.04} 19.85 0.01} 39.18 1.35} 22.97 0.12] 22.66 0.01] 20.47 0.06 
v.prodcelli7| 7.04/2.91)27|112.57|0.03}124220 2.46/112.57 0.03}222.22 44.88/115.68 0.12/115.37 0.03]113.19 6.95 

2 

2 

2 


v.prodcellıs| 0.62/2.91]13) 53.40/0.02| 58.95 0.54| 53.40 0.02/105.41 8.79] 56.51 0. 56.20 0.02| 54.02 0.8 
v.prodcellig] 2.67/2.91|22| 91.37/0.03| 100.84 1.99| 91.37 0.03|180.37 33.09| 94.48 0. 94.17 0.03| 91.99 3.8 
v.prodcell24|16.38|2.91|37/155.23/0.05| 171.24 3.45|155.23 0.04|306.46 118.50|158.35 0.12|158.04 0.04|155.85 17.78 


Here certification time is the sum of time taken on each component, assum- 
ing the six conditions are computed in parallel. As shown in the diagram, the 
average time ratio is around 8, which is quite promising. Furthermore, Fig. 7b 
shows a comparison of circuit sizes, where the expansion factor € is computed 
by ZS (alternatively, #0" = £ - #C - k). The average value observed here is 
around 1.5. This is consistent with Definition 10, as we expected the size of the 
k-witness circuit to grow linearly with respect to the original circuit and the 
value of k. 
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Fig. 8. Certification time vs. model checking time obtained by running HWMCC’10 
benchmarks. 


We also used benchmarks from the Hardware Model Checking Competition 
(HWMCC) 2010 [4]. The benchmarks were pre-filtered by running on McAiger 
with a time-out of 15min. A total of 513 instances were solved by McAiger, 
from which we selected from the 216 unsat instances with a meaningful k (i.e., 
k > 2). We also observed only 7 out of the 216 instances require simple path 
constraints. The results in Fig. 8 are sorted by the benchmark names, which 
enables us to compare individual benchmarks from the same family. In most 
cases, similar to our previous observation from the TIP suite, the SAT solving 


382 E. Yu et al. 


aa 
m . ote PETTITT Teoeteteete® bah ie Pr 


Fig. 9. The k-witness circuit size vs. the original circuit size. 


time of Yeonsec takes much longer than the rest, while in very few cases it is less 
than the QBF solving time for Preset- The average time ratio is 30, where we 
excluded 4 outliers in the plot coming from the pj20 family, that give a worse 
result (total certification time >15 min). We observe that this was due to the 
high format conversion time from QAIGER to QCIR [25] before the QBF solving 
handled by QuAbS, while the actual QBF solving time was significantly smaller 
and more feasible. We believe this can be overcome by generating an alternative 
format directly in practice. Finally, similar to our previous TIP results, Fig. 9 
shows the values of the expansion factor with an average of 1.5. 

In the final experiments, to further inspect the expansion factor, we generalise 
the Counter example in Example 1, where we scale the number of bits to 500 
with a modulo value 32. To clarify the complexity of our construction for the 
k-witness circuit, we ran experiments with different values of k. The results are 
shown in Fig. 10, where the x-axis shows the values of b up to 431, meaning the 
value of k was scaled up to 400. The expansion factor gradually converges to a 
constant as we increase the value of b, as we expected. 
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Fig. 10. The experimental results of the Counter example. The values of b are shown 
on the x-axis. 
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As noticed above, overall our approach works efficiently in the certification 
stage, in particular, in our implementation we adopted the linear construction 
of k-witness circuit in Definition 10, thus the size of the resulting AIGER circuit 
is linear in the size of the original circuit, and the value of k. Each component in 
the tool suite works independently from each other when performing verification, 
which increases trust in the verification results. 


7 Conclusion 


We propose an approach to certify k-induction-based model checking results, 
by extending the model to produce an inductive invariant. The resulting tool, 
CERTIFAIGER, was evaluated experimentally on multiple sets of widely used 
benchmarks. The analysis showed our approach can be adapted to use in practice. 

Our certificates are linear in size of the original problem and k. Validation 
requires several SAT checks and solving a simple QBF. In related work [8, 23] 
the worst case is considered to be exponential. It is an interesting open question 
whether our notion of combinational simulation requiring a QBF check for the 
reset condition can be changed to use only SAT checks. 

Further, we only considered k-induction without simple paths constraints, 
even though such constraints on executions of the original model can in princi- 
ple be handled by adding unique state constraints to our k-witness circuit. For 
simplicity we stick to models without such constraints, a restriction also made 
for instance in the hardware model checking competition. Thus certifying k- 
induction with simple path constraints is left to future work as well as handling 
different types of properties such as liveness properties. 

We also want to extend our approach to common preprocessing techniques 
including temporal decomposition [11] or retiming [28] with the goal to obtain a 
single certificate (witness circuit). This goal is particularly challenging for com- 
plex multi-engine model checkers [9,10]. Furthermore, we believe our approach 
can be extended to infinite-state systems, where k-induction is commonly used. 
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Abstract. The problem of model checking procedural programs has fos- 
tered much research towards the definition of temporal logics for rea- 
soning on context-free structures. The most notable of such results are 
temporal logics on Nested Words, such as CaRet and NWTL. Recently, 
the logic OPTL was introduced, based on the class of Operator Prece- 
dence Languages (OPL), more powerful than Nested Words. We define 
the new OPL-based logic POTL, and provide a model checking proce- 
dure for it. POTL improves on NWTL by enabling the formulation of 
requirements involving pre/post-conditions, stack inspection, and others 
in the presence of exception-like constructs. It improves on OPTL by 
being FO-complete, and by expressing more easily stack inspection and 
function-local properties. We developed a model checking tool for POTL, 
which we experimentally evaluate on some interesting use-cases. 


Keywords: Linear temporal logic - Operator precedence languages + 
Model Checking - Visibly pushdown languages - Input-driven languages 


1 Introduction 


Model checking is one of the most successful techniques for the verification of soft- 
ware programs. It consists in the exhaustive verification of the mathematical model 
of a program against a specification of its desired behavior. The kind of proper- 
ties that can be proved in this way depends both on the formalism employed to 
model the program, and on the one used to express the specification. The initial 
and most classical frameworks consist in the use of operational formalisms, such 
as Transition Systems and Finite State Automata (generally Biichi automata) 
for the model, and temporal logics such as Linear-time Temporal Logic (LTL), 
Computation-Tree Logic (CTL) and CTL* for the specification [24]. The success 
of such logics is due to their ease in reasoning about linear or branching sequences 
of events over time, by expressing liveness and safety properties, their conciseness 
with respect to automata, and the complexity of their model checking. 

In this paper we consider linear-time temporal domains. LTL limits its set 
of expressible properties to the First-Order Logic (FOL) definable fragment 
© The Author(s) 2021 
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of regular languages. This is quite restrictive when compared with the most 
popular abstract models of procedural programs, such as Pushdown Systems, 
Boolean Programs [10], and Recursive State Machines [3]. All such stack-based 
formalisms show behaviors that are expressible by means of Context-Free Lan- 
guages (CFL), rather than regular ones. State and configuration reachability, 
fair computation problems, and model checking of regular specifications have 
been thoroughly studied for such formalisms [3,4,13,17,28,30,32,40,51,55]. To 
expand the expressive power of specification languages too, [12,14] augmented 
LTL with Presburger arithmetic constraints on the occurrences of states, obtain- 
ing a logic capable of even some context-sensitive specifications, but with only 
restricted decidable fragments. [41] introduced model checking of pushdown tree 
automata specifications on regular systems, and Dynamic Logic was extended 
to some limited classes of CFL [34]. Decision procedures for different kinds of 
regular constraints on stack contents have been given in [18,29,37]. 

A coherent approach came with the introduction of temporal logics based 
on Visibly Pushdown Languages (VPL) [7], a.k.a. Input-Driven Languages [47]. 
Such logics, namely CaRet [6] and its FO-complete successor NWTL [2], model 
the execution trace of a procedural program as a Nested Word [8], consisting 
in a linear ordering augmented with a one-to-one matching relation between 
function calls and returns. They are the first ones featuring temporal modalities 
that explicitly refer to the nesting structure of CFL [4]. This enables require- 
ment specifications to include Hoare-style pre/post-conditions, stack-inspection 
properties, and more. A p-calculus based on VPL extends model checking to 
branching-time semantics in [5], while [16] introduces a temporal logic capturing 
the whole class of VPL. Timed extensions of CaRet are given in [15]. 

VPL too have their limitations. They are more general than Parenthesis Lan- 
guages [46], but their matching relation is essentially constrained to be one-to- 
one [43]. This hinders their suitability to model processes in which a single event 
must be put in relation with multiple ones. Unfortunately, computer programs 
often present such behaviors: exceptions and continuations are single events that 
cause the termination (or re-instantiation) of multiple functions on the stack. 

To reason about such behaviors, temporal logics based on Operator Precedence 
Languages (OPL) have been proposed [22]. OPL were initially introduced with the 
purpose of efficient parsing [31], a field in which they continue to offer useful appli- 
cations [11]. They are capable of capturing the syntax of arithmetic expressions, 
and other constructs whose context-free structure is not immediately visible. The 
generality of the structure of their syntax trees is much greater than that of VPL, 
which are strictly included in OPL [25]. Nevertheless, they retain the same closure 
properties that make regular languages and VPL suitable for automata-theoretic 
model checking: OPL are closed under Boolean operations, concatenation, Kleene 
* and language emptiness and inclusion are decidable [42]. They have been char- 
acterized by means of push-down automata, Monadic Second-Order Logic and, 
recently, by an extension of Regular Expressions [42,44]. 

OPTL [22] is the first linear-time temporal logic for which a model checking 
procedure has been given on both finite and w-words of OPL. It enables reasoning 
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on procedural programs with exceptions, expressing properties about whether a 
function can be terminated by an exception, or throw one, and also pre/post- 
conditions. NWTL can be translated into OPTL in linear time, thus the latter is 
capable of expressing all properties that can be formalized in CaRet and NWTL, 
and many more. [22] does not explore OPTL’s expressiveness further, and does 
not investigate the practical applicability of their model checking construction. 

In this article, we introduce Precedence Oriented Temporal Logic (POTL), 
which redefines the syntax and semantics of OPTL to be much closer to the 
context-free structure of words. With POTL, it is much easier to navigate a 
word’s syntax tree, expressing requirements that are aware of its structure. From 
a more theoretical point of view, POTL is FO-complete whereas OPTL is not, so 
that CaRet, NWTL, OPTL and POTL constitute a strict hierarchy in terms of 
expressive power. Such a theoretical elaboration, however, is technically involved; 
thus, for length reasons, it is documented in a technical report [23]. 

In this paper, instead, we focus on the model-checking application of POTL. 
We provide a tableaux-construction procedure for model checking POTL, which 
yields nondeterministic automata of size at most singly exponential in the for- 
mula’s length, and is thus not asymptotically greater than that of LTL, NWTL 
and OPTL. We implemented such a procedure in a tool called POMC, which we 
evaluate on several interesting case studies. POMC’s performance is promising: 
almost all case studies are verified in seconds and with a reasonable memory 
consumption, with very few outliers. Such outliers are inevitable, due to the 
exponential complexity of the task. 

The related work on tools is not as rich as the theoretical one. Tools and 
libraries such as VPAlib [48], VPAchecker [54], OpenNWA [27] and Symboli- 
cAutomata [26] only implement operations such as union, intersection, universal- 
ity /inclusion/emptiness check for Visibly Pushdown or Nested Word Automata, 
but have no model checking capabilities. PAL [19] uses nested-word based moni- 
tors to express program specifications, and a tool based on BLAST [36] implements 
its runtime monitoring and model checking. PAL follows the paradigm of pro- 
gram monitors, and is not—strictly speaking—a temporal logic. PTCaRet [52] 
is a past version of CaRet, and its runtime monitoring has been implemented 
in JavaMOP [20]. [49,50] describe a tool for model checking programs against 
CaRet specifications. Since its purpose is malware detection, it targets program 
binaries directly by modeling them as Pushdown Systems. Unfortunately, this 
tool does not seem to be available online. To the best of our knowledge, POMC 
is the only publicly-available! tool for model-checking temporal logics capable 
of expressing context-free properties. 

The paper is organized as follows: we give some background on OPL in 
Sect. 2, we introduce POTL in Sect. 3 and its model checking in Sect. 4, and we 
evaluate our prototype model checker in Sect. 5. Due to space constraints, we 
leave all formal proofs to a technical report [21]. 


1 https://github.com/michiari/POMC. 
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2 Operator Precedence Languages 


We assume some familiarity with classical formal language theory concepts 
such as context-free grammar, parsing, shift-reduce algorithm, syntax tree (ST) 
[33,35]. Operator Precedence Languages (OPL) are usually defined through their 
generating grammars [31]; in this paper, however, we characterize them through 
their accepting automata [42] which are the natural way for stating equiva- 
lence properties with logic characterization, and for model checking. Readers 
not familiar with OPL may refer to [43] for more explanations on the following 
basic concepts; an explanatory example is also given at the end of this section. 

Let X be a finite alphabet, and £ the empty string. We use a special symbol 
# Z X to mark the beginning and the end of any string. An operator precedence 
matriz (OPM) M over X is a partial function (X U {#})? > {<, =, >}, that, 
for each ordered pair (a,b), defines the precedence relation (PR) M (a,b) holding 
between a and b. If the function is total we say that M is complete. We call the 
pair (X, M) an operator precedence alphabet. Relations <, =, >, are respectively 
named yields precedence, equal in precedence, and takes precedence. By conven- 
tion, the initial # yields precedence, and other symbols take precedence on the 
ending #. If M (a,b) = 7, where r € {<, =, >}, we write a m b. For u,v € Xt we 
write u T v if u = xa and v = by with az b. The role of PR is to give structure 
to words: they can be seen as special and more concise parentheses, where e.g. 
one “closing” > can match more than one “opening” <. Despite their graphical 
appearance, PR are not ordering relations. 


Definition 1. An operator precedence automaton (OPA) is a tuple A = 
(7,M,Q,I, F,6) where: (X, M) is an operator precedence alphabet, Q is a finite 
set of states (disjoint from X), I C Q is the set of initial states, F C Q is the 
set of final states, 8 C Q x (X UQ) x Q is the transition relation, which is the 
union of the three disjoint relations dsnitt CQ X XQ, dpush CQ YX Q, and 
pop E Q xX Q x Q. An OPA is deterministic iff I is a singleton, and all three 
components of 6 are—possibly partial—functions. 


To define the semantics of OPA, we need some new notations. Letters 
P, q, Pi, Gi, ... denote states in Q. We use qo —> qi for (qo, 4,91) © Spush; q0 s23 qı 
for (qo, a, q1) € shift, qo => qı for (qo, q2, q1) € pop, and qo © qu, if the automa- 
ton can read w E€ X* going from qo to qı. Let F = X x Q and I” = [U{1L} be 
the stack alphabet; we denote symbols in I” as [a, q] or L. We set smb(|a, q]) = a, 
smb(L) = #, and st([a, q]) = q. For a stack content y = yn... "iL, with y E€ T, 
n > 0, we set smb(y) = smb(yn) ifn > 1, smb(y) = # ifn =0. 

A configuration of an OPA is a triple c = (w, q, y), where w E€ &*#, q E Q, 
and y € I*1. A computation or run is a finite sequence co F cy F... F Cn of 
moves or transitions c; F cj41. There are three kinds of moves, depending on 
the PR between the symbol on top of the stack and the next input symbol: 
Push move: if smb(y) < a then (az, p, y) + (x, q, [a, ply), with (p,a,q) € 
Opushi 
Shift move: if a = b then (bz, q, [a, ply) F (z, r, [b, ply), with (¢,6,r) © ôshift; 
Pop move: if a >b then (bz, q, [a, ply) (bx, r, 7), with (q,p,7) € dpop. 
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call ret han exc 


call] < = < > 

ret) > > > > #[call{[[han{call{call[call]]|exc]call ret]call ret]ret]# 
han| < > < = 

exc| > > > > 


Fig. 1. OPM Mean (left) and a string with chains shown by brackets (right). 


Shift and pop moves are not performed when the stack contains only L. Push 
moves put a new element on top of the stack consisting of the input symbol 
together with the current state of the OPA. Shift moves update the top element 
of the stack by changing its input symbol only. Pop moves remove the element 
on top of the stack, and update the state of the OPA according to pop on the 
basis of the current state of the OPA and the state of the removed stack symbol. 
They do not consume the input symbol, which is used only to establish the > 
relation, remaining available for the next move. The OPA accepts the language 
L(A) = {x € &* | (x#, qr, L) F* (#, ar, 1) ar € Lar E€ F}. 

We now introduce the concept of chain, which makes the connection between 
OP relations and context-free structure explicit, through brackets. 


Definition 2. A simple chain “[cyc2...cg]©+! is a string coci ce... cecey1, such 
that: co, ce41 E VU {F}, ci E€ X for every i = 1,2,...£ @ > 1), and co < cı = 
C2... C¢—1 = C> C41. A composed chain is a string co8oC1$1C2 . . . CeSece41, where 
cofcico...ce]™%+ is a simple chain, and s; € X* is the empty string or is such 
that “[s;]“+ is a chain (simple or composed), for every i = 0,1,...,4 @ > 1). 
Such a composed chain will be written as ©|soc181C2...cese|*!. co (resp. Ce+1) 
is called its left (resp. right) context; all symbols between them form its body. 


A finite word w over X is compatible with an OPM M iff for each pair of 
letters c, d, consecutive in w, M (c, d) is defined and, for each substring x of #w# 
that is a chain of the form “[y]°®, M (a,b) is defined. 

Chains can be identified through the traditional operator prece- 
dence parsing algorithm. We apply it to the sample word wer = 
call han call call exc call ret ret, which is compatible with Mean (for a more 
complete treatment, cf. [33,43]). First, write all precedence relations between 
consecutive characters, according to Mean. Then, recognize all innermost pat- 
terns of the form a < c = ... = c > b as simple chains, and remove their bodies. 
Then, write the precedence relations between the left and right contexts of the 
removed body, a and b, and iterate this process until only ## remains. This 
procedure is applied to we, as follows: 


1|# < call < han < call < call > exc > call = ret > ret > # 
2\|# < call < han < call > exc > call = ret > ret > # 

3\/# < call < han = exc > call = ret > ret > # 

4\# < call < call = ret > ret > # 

5|# < call = ret > # 

6# = # 
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The chain body removed in each step is underlined. In step 1, °?![eall]®*° is 
a simple chain, so its body call is removed. Then, in step 2 we recognize the 
simple chain »?"{call|**°, which means }#"[call{call]]°*°, where [call] is the 
chain body removed in step 1, is a composed chain. This way, we recognize, 
e.g., 2®™[calljex¢, ©" han exc]! as simple chains, and »@"[call{call]|°*° and 
call han |call[call]|exc]°"! as composed chains (with inner chain bodies enclosed 
in brackets). Figure 1 shows the structure of a longer version of Wez, which is an 
isomorphic representation of its ST as depicted in Fig. 4. Each chain corresponds 
to an internal node, and the fringe of the subtree rooted at it is the chain’s body. 

Let A be an OPA. We call a support for the simple chain [cyc2...c¢|©+! 


any path in A of the form qo => qı --> ... --> qe—ı 2 qe Æ de+1- The 
label of the last (and only) pop is exactly qo, i.e. the first state of the path; this 
pop is executed because of relation ce > ce+1. We call a support for the composed 


P . 8 Cc s c2 
chain © [s9c1sic2...cese|“+! any path in A of the form qo ®© qj —> qi ~ qi -=> 
g. 
ce Se 


-> qu~ q ELN qe+1 where, for every i = 0,1,..., b: if s; Æ €, then qi © q 
is a support for the chain “[s,]“+1, else qi = qi. 

Chains fully determine the parsing structure of any OPA over (X, M). If 
the OPA performs the computation (sb, qi,[a,qj]7) H* (b, qk, Y), then “[s]° is 
necessarily a chain over (X, M), and there exists a support like the one above 
with s = socy...cese and qe+1 = qk. This corresponds to the parsing of the 
string soc, . . - Cese within the contexts a,b, which contains all information needed 
to build the subtree whose frontier is that string. 

Consider the OPA A(X, M) = (X, M, {q}, {a}, {a}, Omaz) where ômaz(q, q) = 
q, and Omaz(q,c) = q, Yc E€ X. We call it the OP Maz-Automaton over X, M. For 
a max-automaton, each chain has a support. Since there is a chain *[s]* for any 
string s compatible with M, a string is accepted by A(X, M) iff it is compatible 
with M. If M is complete, each string is accepted by A(X, M), which defines the 
universal language X* by assigning to any string the (unique) structure compat- 
ible with the OPM. With Mean of Fig. 1, if we take e.g. the string ret call han, 
it is accepted by the max-automaton with structure #|[ret]call[han]]#. 

In conclusion, given an OP alphabet, the OPM M assigns a unique structure 
to any compatible string in X*; unlike VPL, such a structure is not visible in the 
string, and must be built by means of a non-trivial parsing algorithm. An OPA 
defined on the OP alphabet selects an appropriate subset within the “universe” 
of strings compatible with M. For a more complete description of the OPL family 
and of its relations with other CFL we refer the reader to [43]. 


2.1 Operator Precedence w-Languages 


All definitions regarding OPL are extended to infinite words in the usual way, 
but with a few distinctions. Given an OP alphabet (X, M), an w-word w € X” 
is compatible with M if every prefix of w is compatible with M. OP w-words 
are not terminated by the delimiter #. An w-word may contain never-ending 
chains of the form cp < cy = c2 = ---, where the < relation between co and cı 
is never closed by a corresponding >. Such chains are called open chains and 
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may be simple or composed. A composed open chain may contain both open 
and closed subchains. Of course, a closed chain cannot contain an open one. A 
terminal symbol a € X is pending if it is part of the body of an open chain and 
of no closed chains. 

OPA classes accepting the whole class of wOPL can be defined by augmenting 
Definition 1 with Biichi or Muller acceptance conditions [42]. In this paper, we 
only consider the former. The semantics of configurations, moves and infinite 
runs are defined as for finite OPA. For the acceptance condition, let p be a run 
on an w-word w. Define 


Inf(p) = {q € Q | there exist infinitely many positions i s.t. (8i, q, £i) € p} 


as the set of states that occur infinitely often in p. p is successful iff there exists 
a state qf E€ F such that qs € Inf(p). An wWOPBA A accepts w € X” iff there is a 
successful run of A on w. The w-language recognized by A is L(A) = {w € X” | 
A accepts w}. Unlike OPA, wOPBA do not require the stack to be empty for 
word acceptance: when reading an open chain, the stack symbol pushed when 
the first character of the body of its underlying simple chain is read remains into 
the stack forever; it is at most updated by shift moves. 

The most important closure properties of OPL are preserved by wOPL, 
which form a Boolean algebra and are closed under concatenation of an OPL 
with an wOPL [42]. The equivalence between deterministic and nondeterministic 
automata is lost in the infinite case, which is unsurprising, since it also happens 
for regular w-languages and wVPL. 


2.2 Modeling Programs with OPA 


For readers not familiar with OPL, we show how OPA can naturally model pro- 
gramming languages such as Java and C++. Given a set AP of atomic proposi- 
tions describing events and states of the program, we use (P(AP), Map) as the 
OP alphabet. For convenience, we consider a partitioning of AP into a set of 
standard propositional labels (in round font), and structural labels (SL, in bold). 
SL define the OP structure of the word: Map is only defined for subsets of AP 
containing exactly one SL, so that given two SL l4, le, for any a,a’,b, b' € P(AP) 
s.t. lı € a,a’ and lp € b,b' we have Myp(a,b) = Map(a’,b’). Hence, we define 
an OPM on the entire P(AP) by only giving the relations between SL, as we did 
for Mean. Figure 2 shows how to model a procedural program with an OPA. The 
OPA simulates the program’s behavior with respect to the stack, by expressing 
its execution traces with four event kinds: call (resp. ret) marks a procedure call 
(resp. return), han the installation of an exception handler by a try statement, 
and exc an exception being raised. OPM Mean defines the context-free structure 
of the word, which is strictly linked with the programming language semantics: 
the < PR causes nesting (e.g., calls can be nested into other calls), and the = 
PR implies a one-to-one relation, e.g. between a call and the ret of the same 
function, and a han and the exc it catches. Each OPA state represents a line 
in the source code. First, procedure py is called by the program loader (MO), 
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pad { pB() { pct) { 

AO: try { BO: pCO; co: if (*) { 

Al: pB(); Br: } Ci: throw; 

A2: } catch { C2: } else { 

A3: pErr(); C3: pcg; 

A4: pErr(); - 

} Cr: 
Ar: } 
call nar, call call call 
OO O OOSA 
a, Bo, CO 
pret Darr 


+ 
n) MO (ar) ret (ar) A4 call AO 
Fe) Ge on az 
PErr 
call a A3 
(aa) 


Fig. 2. Example procedural program (top) and the derived OPA (bottom). ‘*’ implies 
a non-deterministic choice. Push, shift, pop moves are shown by, resp., solid, dashed 
and double arrows. 


and [{call, p4}, MO] is pushed onto the stack, to track the program state before 
the call. Then, the try statement at line AO of py installs a handler. All subse- 
quent calls to pg and pc push new stack symbols on top of the one pushed with 
han. pc may only call itself recursively, or throw an exception, but never return 
normally. This is reflected by exc being the only transition leading from state 
CO to the accepting state Mr, and pg and pc having no way to a normal ret. 
The OPA has a look-ahead of one input symbol, so when it encounters exc, it 
must pop all symbols in the stack, corresponding to active function frames, until 
it finds the one with han in it, which cannot be popped because han = exc. 
Notice that such behavior cannot be modeled by Visibly Pushdown Automata 
or Nested Word Automata, because they need to read an input symbol for each 
pop move. Thus, han protects the parent function from the exception. Since the 
state contained in han’s stack symbol is AO, the execution resumes in the catch 
clause of py. pa then calls twice the error-handling function Parr, which ends 
regularly both times, and returns. The string of Fig. 1 is accepted by this OPA. 

In this example, we only model the stack behavior for simplicity, but other 
statements, such as assignments, and other behaviors, such as continuations, 
could be modeled by a different choice of the OPA and OPM, and other aspects 
of the program’s state by appropriate abstractions [38]. 


3 POTL: Syntax and Semantics 
Given a finite set of atomic propositions AP, the syntax of POTL follows: 


p =a] pvp] y| y| xrel xpel ely eles, ¢ 
| OHY | Ony | p UH y | o SH Y 
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# < call < han < call < call < call > exc > call = ret > call = ret > ret > # 
PA PB Pc po PErr PErr PErr PErr pA 
0 1 2 3 4 5 6 7 8 9 10 11. 12 


Fig. 3. The string of Fig. 1 as an OP word. Chains are shown by edges joining their 
contexts. Standard atomic propositions are shown below SL: p; means a call or a ret is 
related to procedure pı. First, procedure pa is called (pos. 1), and it installs a handler 
in pos. 2. Then, three procedures are called, and one (pc) throws an exception, which 
is caught by the handler. Two more functions are called and, finally, pa returns. 


where a € AP, and t € {d, u}. 

The semantics of POTL is based on the word structure—also called OP word 
for short—(U, Map, P), where U = {0,1,...,n,n+1}, with n € N is a set of word 
positions; P: U — P(AP) is a function associating each position in U with the 
set of atomic propositions holding in that position, with P(0) = P(n+ 1) = {#}. 
Given two positions i, j and a PR 7, we write i m j to say P(t) m P(j). 

We define the chain relation x C U x U so that x(i, j) holds between two 
positions i, j iff i < j — 1, andi and j are resp. the left and right contexts of the 
same chain. For composed chains, x may not be one-to-one, but also one-to-many 
or many-to-one. Given i,j € U, relation x has the following properties: 


1. It never crosses itself: if x(i, j) and x(h, k), for any h,k € U, then we have 
i<h<j = k<jandi<k<j = i<h. 

2. If x(i, j), then i< i+ 1 and j-1>j. 

3. There exists at most one single position h, called leftmost context of j, s.t. 
x(h,j) and h < j or h = j; for any k s.t. x(k, j) and k > j we have k >h. 

4. There exists at most one single position h, called rightmost context of i, s.t. 
x(i, h) and i >h or i = h; for any k s.t. x(i, k) and i < k we have k < h. 


Property 4 says that when the chain relation is one-to-many, the contexts of 
the outermost chains are in the = or > relation, while the inner ones are in the 
< relation. Property 3 says that contexts of outermost many-to-one chains are 
in the = or < relation, the inner ones being in the > relation. In the ST, the 
right context j of a chain is at the same level as the left one i when i = j (e.g., 
in Fig. 4, pos. 1 and 11), at a lower level when i < j (e.g., pos. 1 with 7, and 9), 
at a higher level if i > (e.g., pos. 3 and 4 with 6). 

The truth of POTL formulas is defined w.r.t. a single word position. Let 
w be an OP word, and a € AP. Then, for any position i € U of w, we have 
(w,i) FE aif a © P(t). Operators such as ^ and ~ have the usual semantics from 
propositional logic. Next, while giving the formal semantics of POTL operators, 
we illustrate it by showing how it can be used to express properties on program 
execution traces, such as the one of Fig. 3. 
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a) Next/Back Operators. The 


downward next and back operators ae (12) 
of and ©? are like their LTL coun- AIN 
terparts, except they are true only call (1) —-_ ret (11) 
if th ; pr — 

i e next (resp. current) position “call (9) ret (10) 
is at a lower or equal ST level than / 

the current (resp. preceding) one. The PA | call (7): teh ye) 
upward next and back, O“ and ©”, han(2) - exc (6) 

are symmetric. Formally, (w,i) ZN 

Oty iff (w,i+ 1) Ey and i < (i+ 1) E? 

or i = (i + 1), and (w,i) H| Oty iff call (4) 


(w,i — 1) H gy, and (i — 1) < i or | 
(i — 1) = i. Substitute < with > to ary 
obtain the semantics for O“ and ©”. 
E.g., we can write Ofcall to say that 
the next position is an inner call (it 
holds in pos. 2, 3, 4 of Fig. 3), O%call to say that the previous position is a 
call, and the current is the first of the body of a function (pos. 2, 4, 5), or the 
ret of an empty one (pos. 8, 10), and O"call to say that the current position 
terminates an empty function frame (holds in 6, 8, 10). In pos. 2 Ofppg holds, 
but O”ppg does not. 


Fig. 4. The ST corresponding to the word 
of Fig. 3. Dots are internal nodes. 


b) Chain Next/Back Operators. The chain next and back operators yj and 
X6, t € {d, u}, evaluate their argument respectively on future and past positions 
in the chain relation with the current one. The downward (resp. upward) variant 
only considers chains whose right context goes down (resp. up) in the ST. E.g., 
in pos. 1 of Fig. 3, y4pzrr holds because y(1,7) and (1,9), meaning that pa 
calls Ppgrr at least once. Formally, (w,i) | vd iff there exists a position j > i 
such that x(i, j), i<j or i = j, and (w,j) H y. (w,i) H x%y iff there exists 
a position j < i such that x(j,7), j <i or j = i, and (w,j) H y. Replace < 
with > for the upward versions. In Fig. 3, xexc is true in call positions whose 
procedure is terminated by an exception thrown by an inner procedure (e.g. pos. 
3 and 4). x$call is true in exc statements that terminate at least one procedure 
other than the one raising it, such as the one in pos. 6. x4ret and y#ret hold 
in calls to non-empty procedures that terminate normally, and not due to an 
uncaught exception (e.g., pos. 1). 


c) Until/Since Operators. POTL has two kinds of until and since operators. 
They express properties on paths, which are sequences of positions obtained by 
iterating the different kinds of next or back operators. In general, a pa of length 
n € N between i,j € U is a sequence of positions i = i4 < ig < ++: < in = j. 
The until operator on a set of paths I’ is defined as follows: for any ord w and 
position į € U, and for any two POTL formulas y and 4%, (w,i) = pu(l) 4 
iff there exist a position j € U, j > i, and a path 4, < ig <--- < in between 
i and j in I such that (w,i,) H ọ for any 1 < k < n, and (w,in) H w. Since 
operators are defined symmetrically. Note that, depending on I’, a path from 
i to j may not exist. We define until/since operators by associating them with 
different sets of paths. 
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The summary until yU% 6 (resp. since Y S% 0) operator is obtained by induc- 
tively applying the O* and yj (resp. ©! and xp) operators. It holds in a position 
in which either 6 holds, or y% holds together with Ot (4% Uy 0) (resp. O'( S} 0)) 
or Xp(Y U} 0) (resp. x'p( Sy 0)). It is an until operator on paths that can move 
not only between consecutive positions, but also between contexts of a chain, 
skipping its body. With the OPM of Fig. 1, this means skipping function bodies. 
The downward variants can move between positions at the same level in the ST 
(i.e., in the same simple chain body), or down in the nested chain structure. The 
upward ones remain at the same level, or move to higher levels of the ST. 

Formula T U% exc is true in positions contained in the frame of a function 
that is terminated by an exception. It is true in pos. 3 of Fig. 3 because of path 
3-6, and false in pos. 1, because no path can enter the chain whose contexts are 
pos. 1 and 11. Formula T us exc is true in call positions whose function frame 
contains excs, but that are not necessarily terminated by one of them, such as 
the one in pos. 1 (with path 1-2-6). 

We define Downward Summary Paths (DSP) as follows. Given an OP word 
w, and two positions 7 < j in w, the DSP between i and j, if it exists, is a 
sequence of positions i = 71 < ig < --- < İn = J such that, for each 1 < p < n, 

. —_ jk if k = max{h | h < j A X(ip, h) A (tp < h V ip = h) fexists; 

p+ Vi, +1 otherwise, if ip < (ip + 1) or ip = (ip + 1). 
The Downward Summary (DS) until and since operators U$ and S% use as I the 
set of DSP starting in the position in which they are evaluated. The definition 
for the upward counterparts is, again, obtained by substituting < with >. In 
Fig. 3, call i (ret ^ perr) holds in pos. 1 because of path 1-7-8 and 1-9-10, 
(call V exc) SY pg in pos. 7 because of path 3-6-7, and (call V exc) UY ret in 3 
because of path 3-6-7-8. 


d) Hierarchical Operators. A single position may be the left or right context 
of multiple chains. The operators seen so far cannot keep this fact into account, 
since they “forget” about a left context when they jump to the right one. Thus, 
we introduce the hierarchical next and back operators. The upward hierarchical 
next (resp. back), O%;w (resp. O%;W), is true iff the current position j is the right 
context of a chain whose left context is i, and w holds in the next (resp. previous) 
pos. j’ that is the right context of i, with i < j, j’. So, O};Pgerr holds in pos. 7 of 
Fig. 3 because pzrr holds in 9, and O%;pzrr in 9 because pzrr holds in 7. In the 
ST, OF goes up between calls to per, while OF, goes down. Their downward 
counterparts behave symmetrically, and consider multiple inner chains sharing 
their right context. They are formally defined as: 


— (w,i) H| Of; iff there exist a position h < i s.t. y(h,i) and h < i anda 
position j = min{k | i < k A x(h, k) Ah < k} and (w, j) E ¢; 

— (w,t) | ©}, iff there exist a position h < i s.t. x(h,i) and h < i and a 
position j = max{k | k < i ^A x(h,k) ^h < k} and (w, j) E y; 

- (w,i) K O%y iff there exist a position h > i s.t. x(i,h) and i > h and a 
position j = min{k | i < k A x(k, h) A k >h} and (w, j) E y; 
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— (w,i) | O%y iff there exist a position h > i s.t. y(i,h) and i > h and a 
position j = max{k | k <iAx(k,h)Ak>h} and (w,j) H ¢. 


In the ST of Fig. 4, O4, and ©%, go down and up among calls terminated by the 
same exc. For example, in pos. 3 O%4pc holds, because both pos. 3 and 4 are 
in the chain relation with 6. Similarly, in pos. 4 O4¢;pg holds. Note that these 
operators do not consider leftmost/rightmost contexts, so O},ret is false in pos. 
9, as call = ret, and pos. 11 is the rightmost context of pos. 1. 

The hierarchical until and since operators are defined by iterating these next 
and back operators. The upward hierarchical path (UHP) between i and j is a 
sequence of positions 7 = i, < i2 < --- < in = j such that there exists a position 
h < i such that for each 1 < p < n we have x(h,ip) and h < ip, and for each 
1 < q < n there exists no position k such that ig < k < ig4; and x(h,k). The 
until and since operators based on the set of UHP starting in the position in 
which they are evaluated are denoted as Uf and S% . E.g., call U} parr holds 
in pos. 7 because of the singleton path 7 and path 7-9, and call Sh PErr in pos. 9 
because of paths 9 and 7-9. 

The downward hierarchical path (DHP) between i and j is a sequence of 
positions į = i; < ig < --- < İn = j such that there exists a position h > j such 
that for each 1 < p < n we have x(ip,h) and ip > h, and for each 1 < q < n 
there exists no position k such that ig < k < igy1 and x(k, h). The until and 
since operators based on the set of DHP starting in the position in which they 
are evaluated are denoted as ug and Se . In Fig. 3, calle Pc holds in pos. 3, 
and call S$ Pp in pos. 4, both because of path 3-4. 

The POTL until and since operators enjoy expansion laws similar to those 
of LTL. Here we give those for two until operators, those for their since and 
downward counterparts being symmetric. 


pU p=yV (on (0° (PUL p) V xrl us ¥))) 
guy v= (WAXET AnxST) V (PA OF (YUE Y)) 


3.1 Expressiveness of POTL 


We first define some derived operators. For t € {d,u}, we define the down- 
ward/upward summary eventually as Oty := TUŻ y, and the downward/upward 
summary globally as Oty := =t (~y). O“y and Oy resp. say that y holds in 
one or all positions in the path from the current position to the root of the ST. 
©4y says that y holds in at least one position in the current subtree, and Of% 
in all of them. E.g., if 0%(—p,) holds in a call, it means that pa never holds in 
its whole function body, which is the subtree rooted next to the call. 
In the technical report, we prove 


Theorem 1 ((23]). POTL = FOL with one free variable on OP words. 


Equivalence to FOL on the relevant algebraic structure is a desirable feature of 
linear-time temporal logics, and it was proved for LTL [39] and NWTL [2]. It 
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is in some sense a theoretical assurance of the sufficient expressive power of the 
logic. Moreover, NWTL C OPTL was proved in [22], and OPTL C POTL comes 
from Theorem 1 and the semantics of OPTL being expressible in FOL. In [23], 
we also prove that there exist POTL formulas not expressible in OPTL. Thus, 
we can claim CaRet [6] C NWTL c OPTL C POTL. One of such formulas is 
©4p4 which, evaluated e.g. on a han position with a matched exc, states that 
pa holds in one of the positions in the same subtree. 

More importantly, POTL can express many useful requirements of proce- 
dural programs. To emphasize the potential practical applications in automatic 
verification, we supply a few examples of typical program properties expressed as 
POTL formulas, not all of them being expressible in the other above languages. 

The LTL globally can be written as Ow := 7©O“(©4-). The two nested 

eventually operators enumerate all future positions by going up and then down 
in any direction in the syntax tree: when negated, this means ~~ may never 
hold. POTL can express Hoare-style pre/postconditions with formulae such as 
(call A p = > xġ(ret A 0)), where p is the precondition, and @ is the 
postcondition. 
Unlike NWTL, POTL can easily express properties related to exception han- 
dling and interrupt management [43]. E.g., the shortcut CallThr(~w) := O” (exc A 
Ww) V x$ (exc A Y), evaluated in a call, states that the procedure currently started 
is terminated by a exc in which w holds. So, O(call A p A CallThr(T) => 
CallThr(0)) means that if precondition p holds when a procedure is called, then 
postcondition 0 must hold if that procedure is terminated by an exception. In 
object oriented programming languages, if p = 0 is a class invariant asserting that 
a class instance’s state is valid, this formula expresses weak exception safety [1], 
and strong exception safety if p and 0 express particular states of the class instance. 
The no-throw guarantee can be stated with (call A pa => —-CallThr(7)), 
meaning procedure p4 is never interrupted by an exception. 

Stack inspection [29,37], i.e. properties regarding the sequence of procedures 
active in the program’s stack at a certain point of its execution, is an impor- 
tant class of requirements that can be expressed with shortcut Scall(y, Y) := 
(call => y) S$ (call \ J), which subsumes the call since of CaRet, as it also 
works with exceptions. E.g., O((call A pg A Scall(T,pa)) =» CallThr(T)) 
means that whenever pg is executed and at least one instance of pa is on the 
stack, pg is terminated by an exception. The OPA of Fig. 2 satisfies this formula, 
because ppg is called by p4, and pc throws. 


4 Model Checking 


Given an OP alphabet (P(AP), Map), where AP is a finite set of atomic propo- 
sitions, and a POTL formula y, we build an OPA A, = (P(AP), Map, Q, I, F, ô) 
that accepts models of y. The construction of A, resembles the classical one for 
LTL and the ones for NWTL and OPTL, diverging from them significantly when 
dealing with temporal obligations that involve positions in the chain relation. 
We first introduce Cl(y), the closure of p, containing all subformulas of 
p, and some auxiliary operators. The latter are needed to model-check chain 
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next and back operators. For any PR a € {<,=,>}, we define them as follows: 

(w,i) H| xpo iff there exists j > i such that x(i, j), i m j, and (w,j) H 9; 

(w,2) = xpo iff there exists j < i such that x(j, i), j m i, and (w, j) = ọ. 
Cl(y) is the smallest set such that, for t € {d, u}: 


1. pECl(y), 

2. AP C Cl(y), 

3. if Y € Cl(p) and w 4 78, then ~y € Cl(y) (we identify ~= with Y); 

4. if sw € CHp), then w € Cl(y); 

5. if any of WA @ or y V @ is in Cl(y), then w,6 € Cl(y); 

6. if any of Oty, Od, xy, or xb is in Cl(y), then y € Cl(y); 

7. if x4 (resp. x2) is in CI), then x$y (resp. x2), XFW, Xz are in it; 
8. if y4u (resp. x4) is in Cl(y), then y$w (resp. x3v), vpy are in it; 

9. if any of Y U} 0, y S$ 0, YUy 0, or Y Sy 0 is in Cl(y), then Y, 0 € Cl(y); 
10. if PUL O € Cl(y), then O*(WUy 0), x(V U} 0) € Cl(y) (since is symmetric). 


The set Atoms(y) contains all consistent subsets of Cl(y), i.e. all P C Cl(y) s.t. 


— for every Y E€ Cl(y), Y € @ iff -y ¢ @; 
— p A0 E $, iff Yy € 8 and 0 € @; 
—~ YVO EB, iff y E or 6 € 8, or both. 


The consistency constraints on Atoms(y) will be augmented incrementally in 
the following, for each operator. 

The set of states of A, is Q = Atoms(y)?, and its elements, which we 
denote with Greek capital letters, are of the form = (e, Pp), where Be is the 
set of formulas that hold in the current position, and ®, is the set of temporal 
obligations. The latter keep track of arguments of temporal operators that must 
be satisfied after a chain body, skipping it. The way they do so depends on the 
transition relation 6, which we also define incrementally. Each automaton state is 
associated to word positions. So, for (B, a, Y) € Opush/shift, With ® € Atoms()? 
and a € P(AP), we have 8. N AP = a (by Be N AP we mean the set of atomic 
propositions in ®.). Pop moves do not read input symbols, and the automaton 
remains at the same position when performing them: for any (8, O, W) € dpop 
we impose ®, = W.. The initial set J contains states of the form (®.,,), with 
p E Pe, and the final set F states of the form (We, Yp), s.t. We N AP = {#} 
and W. contains no future operators. We extend the construction to the most 
important operators, leaving the others and correctness proofs to [21]. 


Next/Back Operators. Let (%,a,Y) € dsnitt USpusn, with B, W € Atoms(y)?, 
a € P(AP), and let b = Y, N AP: we have Oty € &, iff y € Y, and either a < b 
or a = b. The constraints introduced for the ©% operator are symmetric, and for 
their upward counterparts it suffices to replace < with >. 

If x4 € Cl(y), for each E € Atoms(y)? we impose that xy € Be iff 
xXEW E€ Be or XFU € Be. Analogous rules are defined for the upward and past 
chain operators. The auxiliary symbol yz forces the current position to be the 
first one of a chain body. Let the current state of the OPA be  € Atoms(y)?: 
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input |state stack PR move 
0 d = 
1|callhan exc ret # i = {call xpret, xpret}, L| # < call |push 
$p = {x1} 

2 han exc ret #8! = ({han}, {yret, xL}) [call, 6°] | call < han| push 
3 exc ret #8” = ({exc}, 0) fhan, &"][call, "| 1 han = exc] shift 
4 ret #9? = ({ret}, 0) [exc, B"|[call, #7] L] exc > ret | pop 
5 ret #| t = ({ret}, {ypret}) [call, 6°] L| call = ret | shift 
6 EID = {#3,0 fret, #7] L| ret ># | pop 
7 EKES) cl ee 


Fig. 5. Example accepting run of the automaton for x/ret. 


XL E€ ©®, iff the next transition (i.e. the one reading the current position) is 
a push. Formally, if (®,a,W) € shift or (P,O, W) © dpop, for any 6,0, and 
a, then xz ¢ Bp. If (,a,W) © push, then x E p. For any initial state 
(Be, Bp) € I, we have xz, E Pp iff # ¢ Be. 


If ypw € Cl(y), its satisfaction is ensured by the following constraints on ô: 


1. Let (G,a,V) € Spush/snife: then XPY E Be iff XPV, XL E p; 

2. let (#,O, Y) € dpop: then XFY ¢ Pp, and XFU € Op iff XFU E Y; 

3. let (,a, W) € ôsnift: then XEY E Bp iff Y E De. 
If yaw € Cl(y), xýv is allowed in the pending part of initial states, and we 
add the following constraints: 

4. Let (©,a,V) € dpusn/snift: then XBW E Be iff KEV, XL € Vp; 

5. let (®,O,) € pop: then xf € Op iff yz € Yp, and either x$w € WY, or 
we Be. 


We illustrate how the construction works for xf with the example of Fig. 5. 
The OPA starts in state 6°, with y4ret € ®?, and guesses that x% will be 
fulfilled by XP so \pret €e ®°. call is read by a push move, resulting in state 
t. The OPA guesses the next move will be a push, so xz € pi. By rule 1, we 
have pret € pl. The last guess is immediately verified by the next push (step 
2-3). Thus, the pending obligation for xFret is stored onto the stack in ®t. 
The OPA, then, reads exc with a shift, and pops the stack symbol containing 
P! (step 4-5). By rule 2, the temporal obligation is resumed in the next state 
Pt, so \pret E D3. Finally, ret is read by a shift which, by rule 3, may occur 
only if ret € &4. Rule 3 verifies the guess that y;ret holds in So, and fulfills 
the temporal obligation contained in D, by preventing computations in which 
ret ¢ & from continuing. Had the next transition been a pop (e.g. because there 
was no ret and call > #), the run would have been blocked by rule 2, preventing 
the OPA from reaching an accepting state, and from emptying the stack. 


Summary Until and Since. The construction for these operators is based 
on their expansion laws. For any © € Atoms(p)’, we have y Už 0 € B., with 


402 M. Chiari et al. 


t € {d,u} being a direction, iff either: 1. 0 € Be, 2. O'(WUL 0), Y € Be, or 3. 
x'p(WUy 0), € Be. The rules for since are symmetric. 


Hierarchical Operators. For the hierarchical operators, we do not give an 
explicit OPA construction, but we rely on a translation into other POTL 
operands. For each hierarchical operator 7 in y, we add a propositional sym- 
bol qn). The upward hierarchical operators consider the right contexts of chains 
sharing the same left context. To distinguish such positions, we define formula 
YL, = XB (day) A O(G-q,)) A O(E-a(m))), where O and E are as in Sect. 3.1. 
O and © are the LTL next and back operators, for which model checking can 
be done as for O? and Of, but removing the restrictions on PR. YL, evaluated 
on a position i, asserts that q(,) holds in the unique position h such that y(h, i) 
and h < i. Thus, q(n) can be used to distinguish other positions j such that 
x(h,j) and h < j, as x$q(,) holds in them. The translations for future upward 
hierarchical operators follow, the others being analogous. 


OHU := YL oyy A O((AxBqC04y)) UY (xBa(0%¥) AY) 
VUE O := Yr yugo ^ (X auugo => ¥)UY (xXÉduugo) A 9) 


4.1 Model Checking for w-Words 


To perform model checking of a POTL formula y on OP w-words, we build a gen- 
eralized wWOPBA A¥ = (P(AP), MAP, Qu, 1,F, 8), where Qu = Atoms(y)? x 
P(Clstack(p)), which differs from the finite-word OPA only for the state set and 
the acceptance condition. As in [2], the generalized Biichi acceptance condition 
is a slight variation on the one shown in Sect. 2.1: F is the set of sets of Biichi 
final states, and an w-word is accepted iff at least one state from each one of the 
sets contained in F is visited infinitely often during the computation. 

In finite words, the stack is empty at the end of every accepting computa- 
tion, which implies the satisfaction of all temporal constraints tracked by the 
pending part of stack symbols. In wOPBAs, the stack may never be empty, and 
symbols with a non-empty pending part may remain in it indefinitely, never 
enforcing the satisfaction of the respective formulas. To overcome this issue, we 
use Atoms(y)? x P(Clstack()), with Clstack(y) C Cl(y), as the state set of 
the wOPBA. Such states have the form @ = (e, Pp, Ps), where P, and P, have 
the same role as in the finite-word case, and ®, is the in-stack part of &. All 
rules previously defined for Pe and Pp remain the same. P, contains elements of 
Clstack(Y) contained in any symbol currently on the stack. Clstack(y) contains 
formulas in Cl(y) that use the stack to ensure the satisfaction of future tempo- 
ral requirements, namely all ypy € Cl(), with a € {<,=,>}. Thus, pending 
temporal obligations are moved from the stack to the wOPBA state, and they 
can be considered by the Büchi acceptance condition. 

Suppose we want to model check ypy. Formula ypy must be inserted in the 
in-stack part of the current state whenever a stack symbol containing it in its 
pending part is pushed. It must be kept in the in-stack part of the current state 
until the last stack symbol containing it in its pending part is popped, marking 
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input [state stack|PR] 
0 _ d = 
1/call call han exc ret ret (call)” S = ({call, xpret, xpret}, L| < 
A 
©! = ({call, xfret, xpret 
2 call han exc ret ret (call)” (iea con xeret}, call, $°] L| < 
{xz xFret}, 0) 
p? = ({h pret 
3 han exc ret ret (call)” a anh {xr xFTet}, [call, t][call, $°] L| < 
{xrret}) 
4 exc ret ret (call)”|®° = ({exc},0,{xjpret}) |[han, ?][call, #t][call, 6°]. | = 
5 ret ret (call)”|64 = ({ret}, 0, {xpret}) [exc, #?][call, #t][call, 9] || > 
© = ({ret}, {xpret 
6 ret ret (call)” (irei h {xpret}, [call, t][call, #9] L| = 
{xpret}) 
7 ret (call)” |4 fret, #1] [call, #9] L| = 
8 ret (call)” |®° = ({ret}, {xpret}, 0) call, °]1| = 
9 (call) |7 = ({call}, 0, 0) fret, 7] L| > 


Fig. 6. Prefix of an accepting run of the automaton for y/ret. 


the satisfaction of its temporal requirement. Then, it is possible to define an 


acceptance set F € F, as the set of states not containing XEY in any part. 
F. 


Figure 6 shows an wOPBA run of this kind. Notice that after step 7 XTY does 
not appear in any state’s in-stack part, so the run is accepting. 

This construction is formalized as follows. Let Y € Clstack(y). We add a few 
constraints on the transition relations. For any 8, O, Y € Qu and a € P(AP): 


6. let (,a, O) € push: if Y € Bp, then Y E Os; 
7. let (©, a,O) € dpush/snife: if Y E Bs, then Y € Os; 
8. let (, O, W) € dpop: if Y E Ps and w E Og, then Y E Y. 


An acceptance condition for summary until operators is also needed. For 
Y us 0 € Cl(y), we add an acceptance set Fyugo such that for any ® in it we 
have yp( Ug 0), xF US 0) ¢ Bs, and either YUZ 0 ¢ Be or 0 € B.e. The 
condition for Y U% 0 is symmetric. 


4.2 Complexity 


The set Cl(y) is linear in |y|, the length of y. Atoms(y) has size at most 
gici)| — 20(l¥l) and the size of the set of states is the square of that in the 
finite case, and is bounded by its cube in the w-case. Moreover, the use of the 
equivalences for the hierarchical operators causes only a linear increase in the 
length of y. Therefore, 


Theorem 2. Given a POTL formula y, it is possible to build an OPA or an 
wOPBA A, accepting the language denoted by p with at most 20(\"l) states. 


A, can then be intersected [42] with an OPA/wOPBA modeling a program (e.g. 
Fig. 2), and emptiness can be decided with summarization techniques [4]. 
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Table 1. Results of the evaluation. ‘# states’ refers to the OPA to be verified. 


Benchmark name # states Time (ms) | Memory (KiB) Result 
Total MC only 
1 Generic (Fig. 2) 12 867 70, 040 10, 166 True 
2 | Generic medium 24 673 70, 064 4,043 False 
3 | Generic larger 30 1,014 70, 063 14, 160 True 
4 | Jensen 42 305 70,050 3,154 True 
5 | Unsafe stack 63 1,493 109, 610 43,177 False 
6 | Safe stack 77 637 70, 089 7,234 True 
7 | Unsafe stack neutrality | 63 5, 286 383, 312 167,654 True 
8 | Safe stack neutrality TT 840 70, O77 16,773 True 


5 Experimental Evaluation 


We implemented the OPA construction of Sect. 4 in an explicit-state model 
checking tool called POMC. The tool is written in Haskell [45], a purely func- 
tional, statically typed programming language with lazy evaluation. POMC 
checks OPA for emptiness by checking the reachability of an accepting con- 
figuration, by means of a modified DFS of the transition relation. This algo- 
rithm, similar to the one in [9], exploits the fact that all transitions only con- 
sider the topmost stack symbol, so reachability is actually computed only for 
semi-configurations made of one stack symbol and one state. Each time a chain 
support is explored, its ending semi-configuration is saved and associated with 
the starting one, so the next time the latter is reached, the support does not have 
to be re-explored. This allows the algorithm to exploit the cyclicities of OPA to 
terminate after having explored the whole transition relation. Given a POTL 
specification y and an OPA A to be checked, POMC executes the reachability 
algorithm, generating the product between A and the OPA for ~y on-the-fly. 
The present prototype of POMC only supports finite-word model checking; its 
extension to deal with w-languages is under development. 

We checked with POMC several requirements on three case studies and we 
report the results in Table 1. Some additional formulas we checked are in Table 2. 
Such results can be reproduced through a publicly available artifact.? The exper- 
iments were executed on a laptop with a 2.2 GHz Intel processor and 15 GiB of 
RAM, running Ubuntu GNU/Linux 20.04. In the tables, by “Total” memory 
we mean the maximum resident memory including the Haskell runtime (which 
allocates 70 MiB by default), and by “MC only” the maximum memory used by 
model checking as reported by the runtime. Since model checking is polynomial 
in OPA size and exponential in formula length, we focus on checking a variety 
of requirements, rather than large OPA. 


? https: //doi.org/10.5281 /zenodo.4723741. 
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Generic Procedural Program. We checked formula 
((call Apg A Scall(T,pa)) => CallThr(T)) 


from Sect. 3.1 on the OPA of Fig. 2 (bench. 1), and also against two larger OPA 
(2, where the property does not hold, and 3, where it holds). 

We also checked the largest of such OPA against a set of formulas devised 
with the purpose of testing all POTL operators. The results are reported in 
Table 2. All formulas are checked very quickly, with only one outlier that runs 
out of memory. We ran the same experiment on a machine with a 2.0 GHz AMD 
CPU and 512 GiB of RAM running Debian GNU/Linux 10, obtaining a time of 
367s with a memory occupancy of 16.3 GiB. 


Stack Inspection. The security framework of the Java Development Kit (JDK) 
is based on stack inspection, i.e. the analysis of the contents of the program’s 
stack during the execution. The JDK provides method checkPermission (perm) 
from class AccessController, which searches the stack for frames of functions 
that have not been granted permission perm. If any are found, an exception 
is thrown. Such permission checks prevent the execution of privileged code by 
unauthorized parts of the program, but they must be placed in sensitive points 
manually. Failure to place them appropriately may cause the unauthorized exe- 
cution of privileged code. An automated tool to check that no code can escape 
such checks is thus desirable. Any such tool would need the ability to model 
exceptions, as they are used to avoid code execution in case of security viola- 
tions. 

[37] explains such needs by providing an example Java program for managing 
a bank account. It allows the user to check the account balance, and to withdraw 
money. To perform such tasks, the invoking program must have been granted 
permissions CanPay and Debit, respectively. We modeled such program as an 
OPA (4), and proved that the program enforces such security measures effectively 
by checking it against the formula 


(call A read => ~(T S! (call A —=CanPay ^ read))) 


meaning that the account balance cannot be read if some function in the stack 
lacks the CanPay permission (a similar formula checks the Debit permission). 


Exception Safety. [53] is a tutorial on how to make exception-safe generic con- 
tainers in C++. It presents two implementations of a generic stack data struc- 
ture, parametric on the element type T. The first one is not exception-safe: if the 
constructor of T throws an exception during a pop action, the topmost element is 
removed, but it is not returned, and it is lost. This violates the strong exception 
safety requirement that each operation is rolled back if an exception is thrown. 
The second version of the data structure instead satisfies such requirement. 

While exception safety is, in general, undecidable, it is possible to prove 
the stronger requirement that each modification to the data structure is only 
committed once no more exceptions can be thrown. We modeled both versions 
as OPA, and checked such requirement with the following formula: 


(exc => 7((O"modified V ypmodified) A (Stack :: push V Stack :: pop))) 
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POMC successfully found a counterexample for the first implementation (5), and 
proved the safety of the second one (6). 

Additionally, we proved that both implementations are exception neutral (7, 
8), i.e. Stack functions do not block exceptions thrown by the underlying type 
T. This was accomplished by checking the following formula: 


(exc A O"TA y$(han A x$Stack) => y$xbxhexc). 


Table 2. Results of the additional experiments on OPA “generic larger”. 


Formula Time | Memory (KiB) Result 
(ms) Tot. MC 
xP Err 1.1 70,095 |175 False 
O4 (Ot (call A xexc)) 21.0 70,095 |1,290 | False 
Of (han ^ (x4 (exc ^ x% call))) 42.2 70,088 |2,297 False 
(exc xpeall) 10.7 70,099 | 839 True 
T US exc 2.2 70,093 | 121 False 
04 (04 (T UE exc)) 4.3 70,094 | 113 False 
((call A pa A (~ret ug WRx)) => xpexc) 3,257.7 | 238,833 | 102,582 | True 
o4(O“eall) 0.7 70,094 | 139 False 
04(04(04%(E“call))) 3.4 70,108 | 126 False 
x$ (Of (O©”call)) 1.3 70,096 | 137 False 
((callA pa A CallThr(T)) =  CallThr(ep)) 7,793.7 | 402,420 | 173,639 | False 
o(0%PB) 2.1 70,097 | 114 False 
o(Of ps) 2.8 70,097 |114 False 
(pa A (callU$ po)) 594.9 |77,806 |29,786 | True 
(pc A (call S, pa)) 676.6 |96,296 |37,949 | True 
((po A xpexc) => (spa Sh PB)) = = = OOM 
(callA pp => -po U% parr) 198.2 | 70,088 | 10,606 True 
O(OHPErr) 1.1 70,093 |114 False 
O(OHPErr) 1:2 70,089 |114 False 
O(pa A (cally, pg)) 10.3 70,105 | 115 False 
(pe A (call SH pa)) 10.8 70,095 | 115 False 
(call xdret) 3.0 70,095 | 112 False 
(call ~n O“ exc) 1.9 70,106 |113 False 
(call ^pa => 7CallThr(T)) 110.7 |70,094 | 4,937 False 
(exc ~(O” (call A pa) V xp (call A pa))) 28.9 70,095 |112 False 
((call A pg A (call sé (call A pa))) = > CallThr(T) | 926.1 | 70,104 | 13,310 | True 
(han x pret) 17.0 70,079 | 1,252 True 
T ux exc 7.7 70,101 | 121 True 
of (04 (T Ux exc)) 44.6 70,104 |2,376 | True 
04 (04 (04 (T Ux exc))) 123.7 | 70,090 | 5,261 | False 
(callA pc = > (T UY exc A x¢han)) 92.9 70,096 | 1,346 False 
call ud (ret A perr) 1.8 70,107 | 114 False 
x$ (call A ((call V exc) S% ps)) 10.8 70,086 | 117 False 
04(04((call V exc) Uy’ ret)) 5.3 70,094 | 114 False 
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6 Conclusions 


We introduced the temporal logic POTL, gave an automata-theoretic model 
checking procedure, and implemented it in a prototype tool. The results obtained 
in its experimental evaluation are promising. Additionally, POTL is proved to be 
FO-complete in a technical report [23]. We argue that the strong gain in expres- 
sive power w.r.t. previous approaches to model checking CFL, which comes with- 
out an increase in computational complexity, is worth the technicalities needed 
to achieve the present—and future—results. 

In the evaluation, we used models directly coded into OPAs. To ease user 
interaction with our tool, we additionally implemented a new input format based 
on a simple procedural language with exceptions and Boolean variables, which is 
automatically translated into OPA. Moreover, we are currently working on the 
implementation of the model checking for w-words, described in Sect. 4.1. 

As a future research step, we plan to develop user-friendly domain-specific 
languages for specification too, to prove that OP languages and logics are suitable 
in practice to program verification. 


Acknowledgments. We are thankful to Davide Bergamaschi for developing an early 
POMC prototype, and to Francesco Pontiggia for implementing performance optimiza- 
tions. 
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Abstract. Decoupled search is a state space search method originally introduced 
in AI Planning. Similar to partial-order reduction methods, decoupled search 
exploits the independence of components to tackle the state explosion prob- 
lem. Similar to symbolic representations, it does not construct the explicit state 
space, but sets of states are represented in a compact manner, exploiting compo- 
nent independence. Given the success of both partial-order reduction and sym- 
bolic representations when model checking liveness properties, our goal is to 
add decoupled search to the toolset of liveness checking methods. Specifically, 
we show how decoupled search can be applied to liveness verification for com- 
posed Biichi automata by adapting, and showing correct, a standard algorithm for 
detecting lassos (1.e., infinite accepting runs), namely nested depth-first search. 
We evaluate our approach using a prototype implementation. 


1 Introduction 


Model checking is a well-known problem in formal verification. Given a formal descrip- 
tion of a system M, the model checking problem is to decide whether the system satis- 
fies a property ¢. In contrast to safety properties, which can only express whether there 
exists a finite run of the system that reaches a state with certain (bad) properties, live- 
ness properties can express good behaviours of the system that should occur repeatedly, 
i.e., infinite runs in which something good happens infinitely often. 

In this work, we consider a liveness verification problem that arises when com- 
posing a set A!,...,.A” of non-deterministic Biichi automata (NBA), each with its 
own acceptance condition. We recall that an accepting run for a single NBA is a lasso 
Pp(Pc)” with a prefix pp and a cycle pe that visits an accepting state. For the compo- 
sition of a set of NBAs into an NBA we consider the following liveness property: a 
composed run is accepting if there is a cycle visiting a state that is accepting for all 
components. Such a general problem captures standard liveness verification problems 
related to w-regular properties. An archetypal example is automata-based LTL check- 
ing, where system components are represented as NBAs and are composed with a prop- 
erty monitor, represented as a Biichi automaton (often the negation of an LTL property). 
In this case an accepting composed run witnesses a violation of a linear-time property. 

The predominant approach to address such verification problems using explicit 
state space search is to use nested depth-first search (NDFS) algorithms [5,22,32], 
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also called double depth-first search, which perform on-the-fly checking of liveness 
properties while composing the NBAs. NDFS, like all state space search methods, suf- 
fers from the state explosion problem. Various methods, such as partial-order reduc- 
tion [10,19,27,30,34], symbolic representations [2,28], symmetry reduction [7,23], 
or Petri-net unfolding [8,9] have been proposed to alleviate the state explosion prob- 
lem. Here, we add decoupled state space search [14], shortly decoupled search, as a 
new method for model checking liveness properties, complementary to the existing 
approaches. Indeed, as Gnad and Hoffmann [14,15] have shown, decoupled search 
complements these techniques in the sense that there exist cases where it yields expo- 
nentially stronger reductions. It has also been shown that decoupled search can be fruit- 
fully combined with partial-order reduction [16], symmetry reduction [18], and sym- 
bolic search [17]. 

Decoupled search has recently been introduced in AI planning [14], addressing goal 
reachability problems. Its applicability to model checking of safety properties has been 
shown in [12], where it was effectively introduced into the SPIN model checker [20]. 
However, the extension of decoupled search to cycle detection problems inherent to 
liveness model checking and NDFS algorithms has not yet been investigated. This paper 
addresses that investigation for the first time. 

Decoupled search exploits the independence of system components, similar to 
partial-order reduction techniques, by not enumerating all interleavings of transitions 
across components. Similar to symbolic representations, decoupled search does not 
construct the explicit state space of the product. Instead, search nodes, called decoupled 
states, symbolically represent sets of states. Each decoupled state compactly represents 
many global states and their closure up to internal transitions of individual components. 
Similar to partial-order reduction or symbolic search, decoupled search can be expo- 
nentially more efficient than explicit search of the state space, as shown for reachability 
problems in the domains of AI planning [14] and model checking [12]. 

The main contribution of our paper is to extend the scope of decoupled search from 
safety properties, as done in [12], to liveness properties. In particular, we adapt a stan- 
dard NDFS algorithm to the decoupled state representation. The resulting algorithms 
are able to solve the verification problem mentioned above, namely checking accep- 
tance of composed NBAs. The main technical challenge for the correctness of our algo- 
rithms was to identify the conditions that imply existence of accepting runs in decoupled 
search and to show how such runs can be constructed efficiently. 

We evaluate our decoupled NDFS algorithm using a prototype implementation on 
two showcase examples similar to the dining philosophers problem, and a set of ran- 
domly generated models. We compare to established tools, namely the SPIN model 
checker [20], and Petri-net unfolding with Cunf [30]. The results show that, like for 
safety properties, decoupled search can yield exponential advantages over state-of-the- 
art methods. In particular, its advantage grows with the degree to which components act 
independently of others, via internal transitions that do not affect other components. 

The rest of the paper is structured as follows. We start in Sect. 2 by recalling the 
necessary background on NBAs, the verification problem we consider, and a standard 
NDFS algorithm typically used to solve the problem. Sections 3-5 present our contri- 
bution: Sect.3 formalizes decoupled search in terms of composed NBAs, and shows 
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its desired properties; Sect. 4 discusses some issues that would arise in a naïve attempt 
to (incorrectly) adapt it, and describes the (correct) adapted NDFS algorithm; Sect. 5 
provides its correctness proof. In Sect.6 we show our experimental evaluation, whose 
code and models are publicly available at [13]. Section 7 concludes the paper discussing 
related works and future research avenues. 


2 Biichi Automata, Composition and Verification 


This section recalls some basic notions of Biichi automata, their composition, the veri- 
fication problem we consider in this paper for such composition, and its standard algo- 
rithmic resolution based on NDFS. 


Biichi Automata and Accepting Runs. We start with the definition of non-deterministic 
Biichi automata (NBA). 


Definition 1 (Non-deterministic Biichi Automaton). A non-determinitic Biichi 
automaton A is a tuple (S',—, L, so, A), where S is a finite set of states, L is a finite set 
of transition labels, >C S$ x L x S is a transition relation, sọ € S is an initial state, 
and A € (S — B) is an acceptance function. 


A run p of an NBA is an infinite sequence of states so, 51, 52,--- € S” starting from 
the initial state. The i-th state of a run p is denoted by p/i] and we will use the same 
notation for other lists and sequences. A run p is accepting if it traverses accepting 
states infinitely often. Formally, Jj € N : A(s[j]). We define a trace 7 of a run p = 
So, $1, 52,:°: E S” as a sequence of labels 7 = I9,1,,--- € L” such that Vien : 
(si, li, 8:41) €—. We will also consider finite runs p € S” and finite traces 7 € L”. 

As hinted in Sect. 1, the existence of accepting runs is interesting for several the- 
oretical and practical reasons. On the theoretical side, the language of an NBA is the 


set of all traces ø in LY for which an accepting run exists such that pfi] an pli + 1] 


for all ¿ € N. On the practical side, model checking w-regular properties, including 
LTL properties, can be reduced to checking the existence of accepting runs. Such runs, 
indeed, provide witnesses or counterexamples for the properties of interest. 


Composition of NBAs. From now on we assume that the set of labels L of an NBA is 
partitioned into a set Ly of internal labels and a set Lg of global labels. The notion of 
composition we use is based on (maximal) synchronisation on global labels, in words: 
in every transition involving a global label, each component having the global label in 
its set of labels must perform a local transition, while transitions with internal labels can 
be performed independently. When composing NBAs we assume w.l.o.g. that they do 
not share any internal label. Further, we assume that every global label is shared by at 
least two component NBAs. Otherwise, such labels can be made internal. We will use 
the following notation: for a set A',...,A” of NBAs, we use superscripting to denote 
the components of each A’, i.e., we assume A’ = (S’, 3", LË = Li U Li, s$, A’). 
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Definition 2 (Composition of NBAs). The composition of n NBAs A!,..., A", 
denoted by A! || ... || A”, is the NBA (S,—,L, sọ, A), where S = S! x- x 8", 
L= Users,...m} L’, so = (sg,..-, 87), A = {(81,---,8n) > Ajai,....nA*(8;)} and 
— is the smallest set of transitions closed under the following rules for interleaving of 
local transitions (1) and maximal synchronization on global labels (2): 


(1) si — s; lre Lt 
lr 
(s1, > Sis , Sn) => (s1, s Sis , Sn) 
J : i la y i 
(2) Tie {l,..., pila E€ Le Veu cas nligeLd,} ` SI > Sj Vien ae nliggLi,} SI = 83 
($1,---; Sn) = (si320-530) 


As notation convention, we will denote component states simply by small case let- 
ters, e.g. s, and composed states (51,...,5n) E€ S by s, i.e., as a vector, and similarly 
for local runs p (resp. traces 7) and composed runs p (composed traces 7). 

In Fig. 1 we illustrate a small example of a composition of two NBAs A!, A?. In 
the top of the figure, we show the local state space of the two components (A! left, A? 
right), where the component states are St = {1,2,3}, S? = {A, B}, and the labels are 
defined as Lt = L3 = {1,12}, L} = {lf}, L? = {1f}. A local state is accepting 
for A‘, so Al(s) = T, iff s = 2, and similar A?(s) = T iff s = B. The initial states 
are sy = 1 and så = A. The transitions are as shown. In the bottom, we depict the 
part of the state space of the composition A! || A? reachable from sọ = (1, A) as it 
would be generated by a standard DFS. Here, transitions via global labels synchronize 
the components, internal transitions are executed independently. The states crossed out 
would be pruned by duplicate checking, the underlined state is accepting. 


nD = le 
(3) 
i 2; 
> 


12 (1, A) > (2, A) 


Fig. 1. Example of two NBAs, 4’ and A”, and the state space of their composition A’ || A?. 


le 
> 


(3, A) ay 


Verification Problem and Its Resolution with NDFS. The verification problem we 
address in this paper is the existence of accepting runs in the composed NBA A! || 
. || A”. In words, we look for runs in A! || ... || A” that infinitely often traverse 
states in which all component NBAs are in an accepting state. We discuss alternative 
acceptance conditions in Sect. 7. 
Determining the existence of accepting runs in an NBA can be boiled down to the 
existence of so-called lassos, i.e., finite sequences of states in the NBA of the form p,p, 
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CheckEmptiness(A' ||... || A”): DFS(s): 
Stack + (so) V =V uU {s} 
vep for all t s.t. s > t do 
ve if t € V then continue 
DES (#0) push(Stack, t) 
return empty DFS(t) 
NestedDFS(s): pop(Stack) 
for all t s.t. s — t do if A(s) then 
if t € V’ then continue NestedDFS(s) 
if t € Stack then return cycle Vi=V'U {s} 
Vi=V'U tt} 
NestedDFS(t) 


Fig. 2. A standard NDFS algorithm for lasso search in composed NBAs. 


where p, is the prefix of the lasso and p, is the cycle of the lasso, which contains at 
least one accepting state and closes the cycle (i-e., P,[|P,| — 1] = p-[|e-| — 1]. Such a 
finite sequence of states represents an accepting run Pu(Pe)”- 

Several algorithms can be used to check the existence of lassos. The predominant 
family of algorithms are the variants of NDFS, originally introduced in [5]. Figure 2 
shows the pseudo-code for one such variant, based on NDFS as presented in [4]. The 
algorithm is based on an ordinary depth-first search algorithm (DFS) that works as 
usual: a set V is used to record already visited states, and recursion enforces the depth- 
first exploration order of the state space. Moreover, a stack Stack is used to keep track of 
the states on the current initial trace being explored. The main difference w.r.t. ordinary 
DFS is that a second, nested, depth-first search algorithm (NestedDFS) is invoked from 
accepting states on backtracking, i.e., after the recursive call to DFS. The idea is that, if 
this second depth-first search finds a state that is on Stack, then it is guaranteed that a 
cycle has been found, which contains at least one accepting state. That is, one finds the 
(un)desired lasso. The algorithm is also complete: no accepting cycle is missed. 


i le la l i 

(1, A) (2, A) ý (3, A) á (1, B) > (2, B) > 2; A) 
Saua s 2 

n (2, B) 5 (2, A) 


Fig.3. Example run of CheckEmptiness. The wavy arrow indicates the invocation of 
Nested DFS((2, B)); the dashed arrow indicates how the cycle is closed. 


In Fig. 3, we illustrate an example run of the CheckEmptiness algorithm on our 
example. When DFS backtracks from (2, B), NestedDFS is invoked, illustrated by the 
wavy arrow. NestedDFS generates the successor (2, A), which is on Stack, so a cycle 
is reported. We can construct an accepting run Ppl Pe)” with prefix p, induced by the 
trace l} and cycle p, induced by the trace 12,, 1%, 1}, 17. 
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3 The Decoupled State Space for Composed NBAs 


As previously stated, decoupled state space search was recently developed in AI plan- 
ning [14], and adapted to model checking of safety properties later on [12]. It is 
designed to tackle the state explosion problem inherent in search problems that result 
from compactly represented systems with exponentially large state spaces. In AI plan- 
ning, where decoupled search was originally introduced, such systems are modelled 
through state variables and a set of transition rules (called “actions”’). The adaptation of 
decoupled search to reachability checking in SPIN presented in [12] devised decoupled 
search for automata models, but informally only. Here, we introduce decoupled search 
formally for NBA models. We define the decoupled state space for composed NBAs, as 
the result from the composition of a set of NBAs. 


3.1 Decoupled Composition of NBAs 


In contrast to the explicit construction of the state space, where all reachable states are 
generated by searching over all traces of enabled transitions, decoupled search only 
searches over traces of global transitions, the ones that synchronize the component 
NBAs. In decoupled search, a decoupled state sP compactly represents a set of states 
closed by internal steps. This is done in terms of the sequence of global labels used 
to reach these states, plus a set of reached states for each component. Definition 3 for- 
malizes this through the operation decoupled composition of NBAs, which adapts the 
composition operation provided in Definition 2 to decoupled state space search. 


Definition 3 (Decoupled composition of NBAs). The decoupled composition of n 
NBAs A!,..., A”, denoted by A! ||p ... |D A”, is a tuple (SP ,—pn,La, s8, A) 
defined as follows: 


- SP =Pt(S1) x.-- x PH(S”), with P(S) := 25 \ 0. 
— s) = (iclose(s$),...,iclose(s2)}, with iclose(s) being the set of states s' that are 


reachable from s in A’ using only A'’s internal transitions L}: 
iclose(s) = {s’ | HET ai} and iclose(S) = Use g iclose(s). 
— AP (sP) = Vie{i,.. n} | ds’ € S; : A*(s*), where s? = (S1,..., Sn). 
— —p is the smallest set of transitions closed under the following rule: 


la € La Vietin} 8! ={s,|8€5?: 8% (sh,...,51,...,8,)} SAO 


sP ce (iclose(S}),..., iclose(S/, )) 


where, abusing notation, we write s¢s” if sP=(S1,..., Sn) and s € S1 X... X Sn. 


In the decoupled composition A! ||p ... |p A” a decoupled state s” is defined 
by a tuple (s?[.A1],...,s?[A"]), consisting of a non-empty set of component states 
sP [AĴ] for each A’. A decoupled state represents exponentially many member states, 
namely all composed states s = (51,..., Sn) such that s € sP[A}] x «+» x s?[A”]. 


We will always use a superscript D to denote decoupled states s”. 
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We overload the subset operation C for decoupled states sP by doing it component- 
wise on the sets of reached local states, namely sP C tP = VA": sP[A‘] C t?[A']. 

During a search in the decoupled composition we define the global trace of a decou- 
pled state sP”, denoted m| (sP), as the sequence of global transitions on which s? was 
reached from sp. For DFS, as considered in this work, this is well-defined. 

In explicit state search, states that have been visited before — duplicates — are 
pruned to avoid repeating the search effort unnecessarily. The corresponding operation 
in decoupled search is dominance pruning [14]. A newly generated decoupled state tP? 
is pruned if there exists a previously seen decoupled state sP that dominates t”, i.e., 
where tP? C sP. With the correctness result given below, this is safe. One can make 
the representation of decoupled states, and thereby also the dominance checking, more 
efficient by representing the state sets s? [At] symbolically [17]. 

The initial decoupled state is obtained by closing each local state with internal steps 
(iclose), and decoupled transitions generate decoupled states whose local states are also 
closed under internal steps. This maximally preserves the decomposition afforded by 
the decoupled representation. Namely, as we will prove in what follows, a decoupled 
state sP compactly represents all explicit states that are reachable via traces that extend 
the global trace m (sP) = 14, 12,...,1% with local transition labels. That is, for every 
component A’, sP contains the non-empty subset of its local states sP [A] C S* that 
can be reached with traces 7; = 11, l2,...,l, such that there exist indices jı < jg < 
+++ < jp where lj, = nC (s? )[ji] for all 1 < t < k. In words, after every global label 
on n (sP), arbitrary enabled sequences of internal transitions are allowed. 

We remark that the decoupled composition of a set of NBAs is always deterministic. 
For every pair of decoupled state s? and global label lg, there is a unique successor 
t?. This is easy to see, since if there is a composed state s contained in s? that has 
multiple outgoing transitions labelled with lg, all of the composed successor states are 
contained in tP . This increases the possible state space reduction compared to standard 
search, which needs to branch over all these successors. Note that this is different from 
the determinization of NBA, which comes with a blow-up [31]. The determinism is 
a consequence of the compact representation where all possible outcome states of a 
non-deterministic transition are contained in the decoupled successor state. 


3.2 Correctness of Decoupled Composition 


In this section we show that decoupled search, as presented here, is sound and complete 
w.r.t. reachability properties. We adapt the corresponding result from AI planning [14]. 
We require some additional notation. For a trace m, by @(7r) we denote the subse- 
quence of m that is obtained by projecting onto the global labels Le. 
As previously stated, the decoupled state space captures reachability of the com- 
posed system exactly. The proof is an adaptation of previous results from AI Plan- 
ning [14] to composed NBAs as considered here. 


Theorem 1. A state t of a composition of NBAs A’ || ... || A” is reachable from a 
state s via a trace T, iff there exist decoupled states s? , t? in the decoupled composition 
A! |lp ... |p A”, such that s € 3P, t € tP, and tP is reachable from 3P via n(m). 
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Fig. 4. Illustration of the exponential separations to ample sets (left) and unfolding (right). 


itl 
Proof. Let tS (r) = [haic pla, and s? i. sPa for all 1 <i < k. We prove the 
claim by induction over the length of 7° (zr). For the base case |nr] (7) | = 0, the claim 
trivially holds, since, by the definition of iclose(), sP? contains all composed states t 
that are reachable from any s € s” via only internal transitions. 

Assume a decoupled state sP is reachable from s? via lh, is ih; Then, by the 
definition of decoupled transitions and iclose(), the state s21 contains all composed 
states s;+ı that are reachable from a state s; € sP via a trace 7’~**1 that consists 
of only internal transitions and Le By hypothesis, we can extend the traces reaching 
every such s; from a s € sP by 7*~**! and obtain a trace reaching s;,, from s with 
global sub-trace 1},,... 14,17. 

For the other direction, if a composed state s; is reached in a decoupled state s? 
and can reach a state s;41 via a trace m'—*+! that consists of internal labels and la 

i+] 
then there exists a decoupled transition s? Las SPa and, again by the definition of 
decoupled transitions and iclose(), sPa contains s;+1. By hypothesis s? is Teak ADIS 


from s? , where s; is reachable from s € sP. Thus, sPa is reachable from s“ via 
1 i it 
lg- lale - 


3.3 Relation to Other State-Space Reduction Methods 


Prior work has investigated the relation of decoupled search to other state-space reduc- 
tion methods in the context of AI planning [14,15], in particular to strong stubborn 
sets [34], Petri-net unfolding [8,9], and symbolic representations using BDDs [2,28]. 
For all these techniques, there exist families of scaling examples where decoupled 
search is exponentially more efficient. 

We capture this formally in terms of exponential separations. A search method X 
is exponentially separated from decoupled search if there exists a family of models 
{M"=A!,...,A™ | n € N} of size polynomially related to n such that (1) the number 
of reachable decoupled states in A! ||p ... ||p A™ is bounded by a polynomial in n, 
and (2) the state space representation of A! || ... || A’ under X is exponential in n. 

We next describe two scaling models showing that the ample sets variant of 
SPIN [21,29], as a representative for partial-order reduction in explicit-state search, and 
Petri-net unfolding are exponentially separated from decoupled search. For symbolic 
search with BDDs, the reduction achieved by both methods is in general incomparable. 

For ample sets, a simple model family looks as follows: there are n components, 
each with the same state space: two local states Ai, Bi, initial state A;, two global tran- 
sitions las i, one internal transition. A component and the transitions are depicted in 


Model Checking w-Regular Properties with Decoupled Search 419 


the left of Fig. 4 (the dashed transition is internal). The global transitions synchronize 
components pairwise; our argument holds for every possible such synchronization. 

Under ample set pruning, no reduction is achieved (no state is pruned) because there 
is a global transition enabled in every state. Thus, there exists no state where only safe 
(i.e. internal) transitions are enabled, and the search always branches over all enabled 
transitions of all components. The decoupled state space, in contrast, only has a single 
decoupled state, where both local states are reached in each component. All decoupled 
successor states are dominated and will be pruned. 

Similar to decoupled search, Petri-net unfolding exploits component independence 
by a special representation. Instead of searching over composed states and pruning tran- 
sitions, the states of individual components are maintained separately. ! 

A scaling model showing that unfolding is exponentially separated from decoupled 
search is illustrated in the right of Fig. 4. There are n components, each with the same 
state space with three local states A;, B;, C;, a global label lg, and transitions as shown 
in the figure. In a Petri net, this model is encoded with 3n places and 2” transitions, one 
for every combination of one output place in each of the components. In the unfolding, 
this results in an event (the equivalent of a state) for every net transition. The decoupled 
state space has only two decoupled states: the initial state where {.A; } is reached for all 
components, and its lg-successor where { B;, Ci} is reached in every component. 


4 NDFS for Decoupled Search 


We now adapt NDFS to decoupled search. We start by discussing the deficiencies of 
a naive adaptation. We will then introduce the key concepts in our fixed algorithm in 
Sect. 4.2, and present the algorithm itself in Sect. 4.3. We close this section by showing 
that the exponential separations to partial-order reduction and unfolding from Sect. 3.3 
carry over to liveness checking by simple modifications of the models. 


4.1 Issues with a Naive Adaptation of NDFS 


In a naive adaptation of NDFS to decoupled search, the only thing that changes is the 
treatment of decoupled states, which represent sets of composed states, compared to sin- 
gle states in the standard variant. This leads to three mostly minor changes: (1) instead 
of duplicate checking we perform dominance pruning; (2) checking if a decoupled state 
is accepting boils down to checking if it contains an accepting member state; and (3) to 
see if a state t contained in a state t? generated in NestedDFS is on the stack, we need 
to check if t? has a non-empty intersection with a state on Stack. 

As we will show next, it turns out that this naive adaptation can miss cycles due to 
pruning. Revisiting a composed state in NestedDFS does actually not imply a cycle, 
because reaching t? from s? entails only that every member state of t? can be reached 
from at least one member state of sP , not from all of them. The critical point is that 
pruning does not take into account from where states are reachable. 


' A general difference between the methods is that checking reachability of a conjunctive prop- 
erty is linear in the number of decoupled states, but NP-complete for an unfolding prefix [27]. 
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Fig. 5. Counterexample showing that a naive adaptation of the NDFS algorithm is incomplete. 
The (only) component NBA A! is depicted on the left. The search tree on the right shows the 
entire reachable decoupled state space, where pruned states are crossed out; the wavy arrow 
depicts the invocation of NestedDFS on the acceptance restriction s7 a of so. 


Consider the example in Fig. 5. The left part of the figure shows the local state space 
of component NBA At. For simplicity, we only show a single component, which is 
sufficient to illustrate the issue. Here, A! is defined as follows: $1 = {1, 2, 3,4}, L& = 
{lu, 12}, L! = {1}, 17,13}, A! (s) = T iff s € {2,4}, and sh = 1. The transitions are 
as shown in the left of the figure. The decoupled search space generated using NDFS is 
depicted in the right of the figure. Pruned states are crossed out. 

NestedDFS is launched (indicated by the wavy arrow) on the accepting initial state 
s? . Before explaining the main issue, we remark that, to ensure that a cycle through an 
accepting member state of sẸ? is found, not a cycle through a non-accepting one, we 
need to restrict the set of reached local states to those that are accepting, and the states 
internally reachable from those via iclose(). Thus, NestedDFS starts in what we call the 
acceptance-restriction s% 4 of s? , where 85, alA'] = {2,4}. Now, the issue results from 
the fact that s? a Contains two accepting member states, only one of which, namely 
state 2, is on a cycle. Assuming that the decoupled states are generated in order of 
increasing subscripts, so s? before s? and so on, state 2 is first reached in Nested DFS 
as a member state of s? a» but via the transition labelled with l2, from state 3, so the 
cycle cannot be closed. When generating the /4, successor sT aof sh A» its only member 
state 3 has already been reached in s? A» SO s? 4 is pruned and the cycle of state 2 via 
Ih, 2, is missed. In the next Section we show how to fix this, through an extended state 
representation that keeps track of reachability from a set of reference states. 

Another minor issue are lassos Pp( Pa)” whose cycle p, is induced by internal labels 
only. These will not be detected, because NestedDFS only considers traces via global 
labels. We fix this by checking for L-cycles in every accepting decoupled state gener- 
ated during DFS, to see if there exists a component that can reach such a state. 


4.2 Reference-State Splits 


The problem underlying the issue described in the previous section is that pruning is 
done regardless of the accepting states in the root node of NestedDFS. We now intro- 
duce an operation on decoupled states splitting them with respect to the set of reached 
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local accepting states for each component. In our algorithm, this will serve to distin- 
guish the different accepting states, and thus force dominance pruning to distinguish 
reachability from these. Formally, we define the restriction to accepting local states as 
a new transition with a global label l that is a self-loop for all accepting states: 


Definition 4 (Acceptance-Split Transition). Let (SP , —p,L, s? , A?) be the decou- 
pled composition of A!,..., A”. Let sP? be an accepting decoupled state, and for 
1 <i< nlet (si,...,st,} C sP[A?] be the list of reached accepting states of A’, 
where for all 1 < j < c; : A’ (s$) = T. Then the acceptance-split transition lå in s? is 
defined as follows: 


AMG? VST Vie {1,...n}, j € {1,..., ci} : 4 ee (AAA (si) = T 


14 
sP “sp ((iclose(s}),... ,iclose(s},)), .. . , (iclose(sẸ), ...,iclose(s” )}) 
The outcome state s*, of an acceptance-split transition is a split decoupled state. The 


set of reference states of s? is R(s@) := {s | 3A; : s € s?[A‘] A AŻ (s) = TH. 


In words, the operation splits up the single set of reached component states s?[A‘] 
of A’ into a list of state sets, where each such set s%[.A’], contains the states that can 
be reached via internal transitions from the respective accepting state s € sP [AŻ]. 

Our search algorithm will use the acceptance-split transition to generate the root 
node s? of NestedDFS from an accepting state sP backtracked from in DFS. Hence 
Nested DFS will search in the space of split decoupled states. The transitions over these 
behind an s% are defined as follows: 


Definition 5 (Split Transitions). Let (S”,—p,L, sẸ , A”) be the decoupled compo- 


“oe ; Dy l 
sition of A!, ... , A”. Let sP and t? be decoupled states, with a transition sP? ->p tP. 


Let (si,...,8',) C S" be reference states for A’. Then the split transition sh +p tR 


Lé; 


is defined such that for every A’ and every 1 < j < ci we have: 


iclose({s’ € tPA] | as € sBl A] : s Sis) la € Li, 


The list of reference states for an A; does not change along a trace of split 


transitions. Let s? be a decoupled state generated by an acceptance-split transition 
A 


s? Lie oe then for all successor states t? of sR the set of reference states is 
R(t?) = R84). 

We extend set operations to the split representation as follows. A split decoupled 
state sR dominates a split decoupled state tk, denoted th Cr sR» if R(tR) C R(sR) 
and for all components A’ and reference states s € R(tR) N S* we have tR[A’], C 
sh [A’],. In contrast, state membership is defined in a global manner, across reference 
states. Namely, the set of local states of an A’ reached in a split decoupled state sp is 


defined as sp[A‘] := Use R(sB)nsi sp[A'],. Composed state membership is defined 


tRIA]é = 


relative to these sB [A] as before. 
An important property of the splitting is that it preserves reachability of member 


: me l } ae l 
states. Concretely, for a split-transition sp Sp t? induced by a transition sP -Sp t? 


for all AŻ it holds that if sR[ AŻ] = s?[A°], then tR [At] = tP [AŻ]. 
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As a notation convention, we will always denote split states sp by a subscript R, 
and the direct outcome of an acceptance-split transition by re with a subscript A. 


12 I 
SPRAY = ({}2, (34) $ SPRA] = (Da {2}4) S Pr 
le 7 53,R ER SiR 
spalA'] = ({2}2, 144) —, È 


p sor] = ({3}2, {}4) — s5 RA] = ({2}2, {}4) 


Fig. 6. With acceptance-splitting, NestedDFS invoked on the 18-successor sp A Of so. of the 
example in Fig. 5 correctly detects the cycle of state 2 induced by the trace 1, l&. 


Considering our example again, Fig. 6 illustrates how, on split decoupled states, the 
1 2 


cycle 2 Ža, 3 te, 2 is not pruned. The state s7 r İs still pruned, as it contains only 
component states reached from state 4. In sT r and s? r» the decoupled state keeps 
track of the traces from the origin state 2, so none of the two is pruned, since they are 
not dominated by any state 5PR (the root node s? 4 Of Nested DFS is not yet visited). 

As indicated before, in our emptiness checking algorithm we will use split decou- 
pled states only within NestedDFS. The seed state s? of NestedDFS will always be the 
lĝ-successor of an accepting state sP backtracked from in DFS. Every member state of 
s? is accepting, or can be reached with Lz-transitions from an accepting state. 


4.3 Putting Things Together: Decoupled NDFS 


We are now ready to describe our adaptation of the standard NDFS algorithm to decou- 
pled compositions. The pseudo-code is shown in Fig. 7. The differences w.r.t the stan- 
dard algorithm (Fig. 2) are highlighted in blue. The basic structure of the algorithm is 
preserved. It starts by putting the decoupled initial state s? onto the Stack in Check- 
Emptiness, and launches the main DFS from it. 

In DFS, the control flow does not change, decoupled states are generated in depth- 
first order by recursion, updating the stack accordingly. There are however three differ- 
ences to the standard variant: 


1. Before generating the successors, we call CheckLocalAccept on each accepting 
decoupled state s”. This detects cycles resulting from Lz-transitions, i.e., cycles 
that occur “within” a decoupled state. To this end, we check whether there exists a 
component A’ for which an accepting local state s‘, is reached that can reach itself 
using only internal labels L’ (the set of such local states can be precomputed, so 
that the check becomes a lookup operation). We can then construct an accepting run 
for the composed system by appending the L’-cycle to the sequence of states that 
reaches s‘, in sP for A’. Note that it suffices if a single component moves and all 
other components remain in a reached accepting state. 

2. Instead of doing duplicate checking, the algorithm performs dominance pruning, 
pruning a new decoupled state t? if all its member states have been reached in an 
already visited decoupled state rP. 


Model Checking w-Regular Properties with Decoupled Search 423 


DFS(s”): 
CheckEmptiness(A' ||p ... ||> A”): V=VuU{s?} 
Stack & (s0 ) if A? (s?) then 
Kf 9 CheckLocalAccept(s”) 
4 Sp for all t? s.t. s? —>p t” do 
DFS(so ) if 3r? € V s.t. tP Cr? then 
return empty continue 
D 
NestedDFS(s 7): Se me 
for all t? s.t. sR >p th do pop(Stack) 
if 3rR € V'st.tR Cr rẹ then if A? (s?) then 
continue 14 
if VA’: Js : s € {B[A'], then Let 54 s.t. s” ->p så. 
return cycle NestedDFS(sẸ) 
if Sr? € Stack s.t. r? C tR then Vi =V'U {s3} 
return cycle CheckLocalAccept(s?): 
V' =V'U {tR} : ‘ ; lpeLt 
NestedDFS(t®) if JA’, s € p [A‘] : A’ (s) Meita 


then return cycle 


Fig. 7. Adaptation of a standard NestedDFS for lasso search in decoupled compositions of NBA. 


3. As discussed in Sect. 4.2, when we launch NestedDFS at a decoupled state sP, we 
do so on the acceptance-split [A-successor s? of sP. 


NestedDFS now starts in the acceptance-split ve and traverses split transitions 
as per Definition 5. On generation of a new state tB, we perform dominance pruning 
against the decoupled states visited during all prior calls to NestedDFS. If in an tp for 
every component A’ there exists a reference state s € S" that is reachable from itself, 
sos € tP [A ls then we can construct a cycle. As we will show in Theorem 4, this test 
is guaranteed to find all cycles that start from an accepting state s4 € ve 

Note that we cannot check for a non-empty intersection with states r? on Stack, 
since these are not split relative to the reference states of sp. Thus, since we do not 
know from which local state in r? the state in the intersection was reached, such a 
non-empty intersection would not imply a cycle. What we can do, however, is check 
for dominance instead, as an algorithm optimization inspired by [22]. The pseudo-code 
in Fig. 7 does so by checking whether tR > r?, where the D relation between a split 
vs. non-split state is simply evaluated based on the overall sets tB[At] vs. rP [AŻ] of 
reached components states. If this domination relation holds true, then the reachability 
issue mentioned in the previous section is resolved because all t € rP are then reach- 
able from sẸ — including those t from which an accepting state s € sẸ is reachable. 
Lemma 1 in the next section will spell out this argument as part of our correctness proof. 

Observe that splitting a decoupled state incurs an increase in the size of the state 
representation, as the same local state may be reached from several reference states. 
More importantly, as dominance pruning is weaker on split states (which after all is 
the purpose of the split operation) the size of the search space may increase. As shown 
by the example in Fig. 5, though, there is no easy way around the splitting, since the 
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Fig. 8. Illustration of the component NBAs used in Example 1. 


algorithm has to be able to know from which component state the successors states 
are reached. Assuming a component has M accepting states, then in the worst case all 
local successor states that are shared between these accepting states can be visited M 
times across all NestedDFS invocations. Unless some of the decoupled states revisiting 
the same member state are pruned by dominance pruning, it can actually happen that 
the revisits multiply across the components, so the size of the decoupled state space in 
NestedDFS can potentially be exponentially larger than the standard state space. As we 
shall see in our experimental evaluation, typically such blow-ups do not seem to occur. 

In case we want to construct a lasso, we need to store a pointer to the predecessor 
of each decoupled state and the label of the generating transition. With this, we can, for 
each component 4’ separately, reconstruct a trace m of a state t € tP reached from 
a state s € sP where r?(s?,t?) = nF (r). Here, for a decoupled state t? that was 
reached from another decoupled state sP” , by @(s?, t?) we denote the global trace via 
which t? was reached from s”. This can be done in time polynomial in the size of the 
component and linear in the length of tÙ (s? tP). Since the traces for all components 
are synchronized via r° (m), we add the required internal labels for each component in 
between every pair of global labels. We remark that, to decide if a lasso exists, we do 
not need to store any predecessor or generating label pointers. 

We next show on an example how our algorithm works. 


Example I. The model has two component NBAs Aj, A2 illustrated in Fig. 8. It is a 
variant of an example from [26]. The Figure should be self-explanatory, we remark that 
all global transitions l4, . . . 1% induce a self loop in the only state 1 of A2. 

CheckEmptiness starts by putting s? onto Stack and enters DFS(s?). Let s? = 
({B, D}, {1}), s? = ({E, F}, {1}), and s? = ({ D}, {1}) be the successors generated 
along the trace 17, /2,,13, in DFS. Since s? C sP € V, s? is pruned and the search 
backtracks to s”. Say DFS selects the transition via ie next, generating the state sP = 
({C}, {1}) and its 12,-successor s? = ({F}, {1}). Then s? is pruned because it is 
dominated by s? € V, and the search backtracks from s? , which is accepting. 

Thus, NestedDFS(s?’,) is invoked, where SEA = (({Chc), ({1}1)), because C 
and 1 are accepting local states that become the reference states of s? 4- NestedDFS 
will follow the trace 1%, 12,12, 1G, 1G. which among others generates the state s? p = 
(({Gkc), ({1}1)) by 16, and ends in SPR = (({G}c), ({1}1)). The latter is pruned, 
because it is dominated by 35. r» Which is contained in V’. No cycle is reported. This is 
correct, because the only member state (C, 1) of sp a does not occur on a cycle. 
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Fig. 9. Illustration of the exponential separations to ample sets (left) and unfolding (right). 


DFS then backtracks to s? = ({B, D}, {1}) and generates its remaining successor 
s? = ({G}, {1}) via 1%. DFS further generates the /7,-successors of s? and eventually 
backtracks from sg’, invoking NestedDFS(s? 4), where s? 4 = (({G}aq), ({1}1)). 


After two transitions via /Z the resulting state sẹ p = (({G}aq), ({1}1)) satisfies 


the condition that for all components A; 4s: s € s% alls namely G and 1. Thus, a 
cycle is reported. It is induced by the trace l4, 17,18, 16,18. 

Note that no decoupled state in the second NestedDFS is pruned, since none of 
them is dominated by a state in V’ of the first NestedDFS invocation. In particular, 
s? 4 = ({Ghq, {1}1) is not dominated by s? p = ({G}c, {1}1), because the reference 
states differ — G and 1 for SE p and C and 1 for sẸ p. 


4.4 Relation to Other State-Space Reduction Methods 


The comparison to ample set pruning and Petri-net unfolding from Sect. 3.3 carries over 
directly to liveness checking via simple adaptations to the examples, see Fig. 9. 


Theorem 2. CheckEmptiness with explicit-state search and ample sets pruning is 
exponentially separated from CheckEmptiness with decoupled search. 


Proof (sketch). The argument from Sect. 3.3 remains valid. With the states B; accepting 
(see Fig. 9, left), explicit-state search with ample sets pruning in the worst case has to 
exhaust the entire state space. It invokes NestedDFS on the accepting state (B;)” and, 
worst-case, needs to exhaust the state space again to detect the cycle. Decoupled search 
invokes NestedDFS on the initial state restricted to the component states B;. Every 
successor of that state closes the cycle via an arbitrary i transition. So there are only 
three decoupled states overall (including the acceptance-restricted initial state). 


Theorem 3. Constructing a complete unfolding prefix is exponentially separated from 
CheckEmptiness with decoupled search. 


Proof (sketch). The component states B; are made accepting and internal transitions 
Bi — Aj; are added to the model (see Fig. 9, right). Unfolding constructs a complete 
prefix as described in Sect. 3.3, plus one event for each new internal transition.* Decou- 
pled search generates the two states as described. The second state has {A;, Bi, Ci} 
reached for all components, its successor via lg is pruned. NestedDFS is invoked on 
its restriction to B;, in which all A; get reached via the new internal transitions. The 
lg-successor of this state closes the cycle, so there are only four decoupled states. 


> A weaker cut-off rule is required for liveness checking that can only increase the prefix size [8]. 
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5 Decoupled NDFS Correctness 


We now show the correctness of our approach. In Lemmas 1, 2, 3, we show that if our 
algorithm reports a cycle, then there exists an accepting run for A! || ... || A”. In 
Theorem 4, we then show that decoupled NDFS does not miss an accepting run. 

We first show that the optimization of checking dominance of states in NestedDFS 
against states on the stack is sound, i.e., that an accepting run exists. 


Lemma 1. Let r? be a decoupled state on the current DFS Stack, and let te bea 


decoupled state generated by NestedDFS. If tp D r?, then there exists an accepting 
run for A! || <.: || A”. 


Proof. Let s? be the accepting state that is backtracked from in DFS, i.e., the current 
NestedDFS was invoked on its [A-successor sp. 

From Theorem | we know that if s? is reachable from s?, then for every state 
s2 € s? there exists a state sı € sP such that sı “> s2, where nC (m) = 1@(s?,, sp). 

This result also holds for decoupled states reached in NestedDFS from states in 
DFS. This is because the acceptance-split transition lä only restricts the set of reached 
member states of sP in sP, so in particular s? C sP., Furthermore, split transitions 
generating states behind s? do not affect reachability of member states of these split- 
decoupled states compared to their non-split counterparts. 

In particular, (1) for every state s2 € s? there exists a state sı € sp that reaches 
s2 on a trace m where nC (m) = 7@(s, s?), which, with sR C s? also holds for all 
S2 € s; and (2) for every state t € i there exists an accepting state s4 € s? that 
reaches t on a trace m where nF (m) = n? (s? , tR). 

Since r? is on Stack, it holds that every s € s? is reachable from a r € r?, and, 
with tp D rP, that every r € r? is reachable from an accepting state s4 € sp. 

Let pred(s? , s? , s2) be a function that, if s? is reachable from s? and s2 € 59, 
outputs a state s1 € s? that reaches sz via a trace m with nS (sP, s?) = tF (r). 

Let so be a state reached in both tR and r”, and let sı = pred(r? , t}, so) be 
its predecessor in rP. If sı = sọ, then we are done, because there exists a lasso 
S0,---,80,---;8A,---,80,---,84, Where s4 is an accepting state traversed in cee 
Such an accepting state exists because all member states of a decoupled state in 
NestedDFS are reachable from an accepting state in sẸ. 

If sı Æ So, then we iterate and set s; = pred(r?, oe Si—1), where such s; exist 
because rP C o Because there are only finitely many states in r? , eventually we get 
si = 8; (where j < i) and there exists a lasso as follows: 

First, there exists a cycle 8;,...,8j-1,...,8; = Si, Where between every pair of 
states Sk, Sk—1 an accepting state S, A in s? is traversed, for the same reason as before. 
We can obviously shift and truncate the cycle to start right after and end in s; 4. The 
prefix of the lasso is 59,..., Si, A- 


Lemmas 2 and 3 show the soundness of our main termination criterion, and of 
CheckLocalAccept. 


Lemma 2. Let te be a split decoupled state generated in NestedDFS. If for every 
component A’ there exists a component state sê such that sè € t?|A*),:, then there 
exists an accepting run for A! || ... || A”. 
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Proof. Let s% be the acceptance-split decoupled state from which NestedDFS was 


started. If for every component A’ such an s’ exists, then the state s = (s',...,s”) is 
reachable in both s% and tB. By the construction of the reached state sets th [A"].i, s 
is reachable from itself and is accepting. Hence, there exists a lasso 80,...,8,...,8. 


Lemma 3. Lett? be an accepting decoupled state generated in DFS such that a cycle 
is reported by CheckLocalAccept(t” ), then an accepting run for A! || ... || A” exists. 


Proof. By prerequisite, there exists an accepting member state s of t”. If CheckLo- 
calAccept(t”) reports a cycle, then there exists a component A’, where an accepting 
state sê € tP [At] is reached that lies on an cycle induced by transitions labelled with 
L}. Thus, we can set the local state of A’ in s to sê, and the lasso looks as follows: 
So,..-,8,...,8, Where on the cycle only A’ moves. 


We are now ready to prove the correctness of our decoupled NDFS algorithm. 


Theorem 4. Let A! || ... || A” be the composition of n NBA and let A! ||p ... ||p A” 
be its decoupled composition. Then CheckEmptiness(A' ||p ... ||p A”) reports a 
cycle if and only if an accepting run for A! || ... || A” exists. 


Proof. If CheckEmptiness reports a cycle, then by Lemmas 1, 2, and 3, which cover 


exactly the cases where a cycle is reported, an accepting run for A! || ... || A” exists. 
For the other direction, assume that p is an accepting run for A! ||... || A”. Let Sa, 

with 0 < a < k, be the accepting state that starts the cycle of the lasso p, = 80,..-, Sa, 

Pe = Sat1;-++;Sk, Where Sa = Spk. Let m = 11,...,1,% be the trace on which sẹ is 


reached, i.e., for alll <i<k: (si, li+1, Si+1) E>. 

By Theorem 1, there exists a decoupled state s? reached in DFS that contains sa. 

If m is such that for alla < i < k : l; € Ly, i.e., the cycle p, is induced only 
by internal labels, we next proof that CheckLocalAccept(s”) reports a cycle: As Sa is 
accepting, s? is accepting, too, so unless a cycle is reported before, eventually Check- 
LocalAccept(s”) is called. If p, is induced by only internal labels, then, because there 
cannot be any component interaction via L;-transitions, there must exist a component 
A’ for which the local state sê in sa reaches itself with only L‘-transitions. We can 
pick any such A’ and ignore transitions from p, that are labelled by an element of 
Lr \ Z}, since these are not required for an accepting cycle. Consequently, CheckLo- 
calAccept(s”) reports a cycle. 

We next show that, if m contains a global label on the cycle, i.e., there exists 
ani E€ {a+ 1,...,k} such that l; € Lg, then, unless a cycle is reported before, 
NestedDFS(s%) reports a cycle, where s% is the [A-successor of sP. 

Assume for contradiction that this is not the case, i.e., no cycle has been reported 
before, and NestedDFS(s%) does not report a cycle. Let NestedDFS(s7) be the first 
call to NestedDFS that misses a cycle, although an Sa € s% that is on a cycle exists. 

If Sa is on a cycle, then by Theorem | there exists a decoupled state t? reachable 
from s? that also contains Sa. The result of Theorem 1 holds in this case because, by 
the definition of split transitions, the splitting does not affect reachability of member 
states. So there exists t? reachable from sẸ that contains sa. 
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Dining Philosophers Ring Topology 

SPIN Cunf DecNDFS SPIN Cunf DecNDFS 
#A| Time #States Mem|Time #E M|Time #States M||#A|Time #States Mem|]Time #E M/Time #S M 
3 0.0 76 129| 0.00 75 6] 0.00 36 8/16 | 0.10 815K 133| 0.00 342 6] 0.00 8 8 
4 0.0 348 129| 0.00 162 6] 0.00 97 8||7 | 0.95 560K 157] 0.00 484 7] 0.00 9 8 
5 0.0 2000 129| 0.00 293 6) 0.00 272 8/8 | 8.35 3.7M 303] 0.01 651 7] 0.00 10 8 
6 0.0 9416 131| 0.01 482 7] 0.01 783 8)//9 | 73.6 24.6M 1367| 0.01 843 8] 0.00 11 8 
7 0.2 45132 139| 0.01 735 8) 0.06 2290 81/10 - - 0.01 1060 9} 0.00 12 8 
8 1.3 212K 175] 0.02 1066 9] 0.60 6761 8)/15 - - 0.03 2525 17| 0.00 17 8 
9 7.9 992K 333| 0.02 1481 11) 5.49 20.1K 9||20 - - -| 0.10 4570 37| 0.00 22 8 
10 | 46.8 46M 993] 0.04 1994 15| 56.7 59.9K 14)/25 - - -| 0.22 7240 74| 0.01 27 8 
11 |278.0 21.6M 3965] 0.04 2386 18| 558 179K 44)/50 - - -| 3.80 30K 917| 0.06 52 8 
12 - - -| 0.06 2874 23 - - -||75 - - - - - -| 0.26 77 8 


Fig. 10. Statistics on the two scaling models, where #A is the number of philosophers, resp. the 
number of NBAs, Time is runtime in seconds, #States (#S) and #E are the number of visited 
states, resp. generated events, and Mem (M) is the memory usage in MiB. 


Denote by me = [g41,...l, the cycle part of m. Because s, is an accepting mem- 
ber state of s”, all its component states si become reference states in sp. Therefore, 
assuming that nF (59, tR) = nF (7e), for all components we have s‘, € tR A's and 
a cycle is reported. If this is not the case, then either (1) sP was not reached in DFS, or 
(2) tR was not reached in NestedDFS(s7). 

In case (1), there must exist a state s8 D sP that prunes sP. But then, sB contains 
Sa, too, and NestedDFS was called on its [4-successor sB a and the cycle of Sa was 
missed before, in contradiction. l 

For (2), either (a) there exists a state iD RƏR i that was reached in a prior invo- 
cation of NestedDFS on an accepting state sp 4, Or (b) a state t? p D tR was reached 
in NestedDFS(s7) before ae In both cases, tR is pruned and the cycle through Sa 
is missed. Case (a) can only happen if sP 4 contains Sa, too, because the reference 
states of s? need to be a subset of the ones of sP, a- But then, the cycle of sa was 
missed before, in contradiction. For (b), if tR CR tP r» then for all A’ we have 
sy E tp rMA'lsi , SO the cycle would have been reported before, in contradiction. 

The reachability argument in (1,2a,2b) applies recursively to all predecessors of s? 
in DFS, and of tp in NestedDFS(s7), so, unless a cycle is reported before, eventually 
a state s? is reached in DFS that contains sa, and a state tR with s4 € tR[A}] i in 
NestedDFS(s7). 


S 


6 Experimental Evaluation 


We implemented a prototype of the decoupled NDFS algorithm from Fig. 7. The input 
is specified in the Hanoi Omega-Automata format [1], describing a set of NBAs syn- 
chronized via global labels as in Definition 2. We compare our prototype to the SPIN 
model checker [20] (v6.5.1), and to the Cunf Petri-net unfolding tool [30] (v1.6.1). We 
also experimented with the symbolic model checkers NuSMV and PRISM [3,25], but 
both are significantly outperformed by the other methods. We conjecture that this is 
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Fig. 11. Left part: scatterplots with the runtime of DecNDFS on the y-axis and the one of SPIN 
(left column) and Cunf (right column) on the x-axis, on randomly generated models. Each point 
represents one instance. In the top row, we highlight different ratios of local labels with differ- 
ent colors/shapes, in the bottom row we highlight different numbers of components. Right part: 
illustrations of the ring model (top) and the fork (middle) and philosopher (bottom) NBAs of the 
philosophers model. Initial (accepting) states are marked by an incoming arrow (double circle). 


because both systems are not specifically designed for asynchronous execution of pro- 
cesses, or LTL model checking. For SPIN, we translate each NBA to a process where 
NBA states are represented by state labels, internal transitions by goto statements, and 
global transitions by rendezvous channel operations. For the latter, SPIN only supports 
synchronization of two processes at a time, so we restrict the models to global transi- 
tions with exactly two components. We model acceptance for SPIN explicitly using a 
monitor process that gets into an accepting state if all processes are in a local accept- 
ing state. The translation for Cunf encodes NBA states as net places and transitions as 
net transitions into a single Petri net, ignoring the individual components. In our proto- 
type and in SPIN, when a lasso is reported or the algorithm proved that no lasso exists 
within the cut-off limits, we say that the instance was solved. For Cunf, we attempt to 
construct a complete unfolding prefix. We consider an instance solved if the construc- 
tion terminates, i.e., we do not actually check the liveness property. The experiments 
were performed on a cluster of Intel E5-2660 machines running at 2.20 GHz, with time 
(memory) cut-offs of 15 min (4 GiB). Our code and models are publicly available [13]. 

We compare SPIN with standard options, i.e., with partial-order reduction enabled, 
Cunf with the cut-off rule of [10], and decoupled search (DecNDFS), using two kinds of 
benchmarks: (1) two scaling examples to showcase the behaviour on well-known mod- 
els. One is an encoding of the dining philosophers problem, the other is a ring-shaped 
synchronisation topology. Both are illustrated in Fig. 11 (right). The philosophers model 
has 2N NBAs, N philosophers and N forks, synchronized by global transitions ie 


and ee +. After synchronizing with its left and right fork, a philosopher can perform an 
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Ratio #|SPIN Cunf DecNDFS|/#A #|SPIN Cunf DecNDFS 

2 750| 721 750 749 

3 750} 696 745 712 
0% 1050] 373 369 319] /4 750| 411 243 541 
20% 1050| 385 384 462]|5 750} 130 114 372 
40% 1050| 397 384 573]|6 750| 49 53 266 
60% 1050| 422 394 723|17 750| 24 34 180 
80% 1050| 468 431 888]/8 750} 14 23 145 
XO 5250] 2045 1962 2965|| >> 5250] 2045 1962 2965 


Fig. 12. Number of solved instances on the random models as a function of the ratio of internal 
transitions (left) and the number of components #A (right). 


internal eat transition; after releasing the forks it can perform an internal think transi- 
tion. In the ring-topology model, each component can enter a diamond-shaped region 
via internal transitions, followed by a synchronization with its left or right neighbor 
via l4 or las No accepting run exists for either model. Moreover, (2) we use a set 
of random automata, where for each combination of a ratio of internal transitions in 
{0%, 20%, ... , 80%}, i.e., the number of transitions labelled with Ly divided by the 
total number of transitions, and a number of components in {2,...,8}, we generated 
sets of 150 random graphs. Each component has 15 to 100 local states, out of which up 
to 3% are accepting (at least one). We ensure that none of the instances has an inter- 
nal accepting cycle to focus on more interesting cases. One could easily implement a 
lookup similar to CheckLocalAccept, which is necessary for DecNDFS, for the other 
methods, too, which then essentially simplifies the problem to basic reachability. 

In Fig. 10, we show detailed statistics for the scaling models, with increasing num- 
ber of components #.A (Time in seconds, #States is the sum of states visited in both 
DFSs, #E is the number of events in the prefix, Memory in MiB). In dining philosophers, 
SPIN and DecNDFS show similar results. SPIN has a runtime advantage in the larger 
instances of roughly a factor of 2, but DecNDFS uses only a fraction of the memory. 
Cunf clearly outperforms both. This model is not very well suited to decoupled search. 
Only half of the NBAs have internal transitions, and only two each, and there are no 
non-deterministic transitions that DecNDFS could represent compactly. On the ring- 
topology model, SPIN manages to exhaust the search space for up to 9 components. 
Cunf and DecNDFS scale significantly higher, the number of decoupled states grows 
only linearly in the number of components. Cunf on the other hand does show a blow- 
up and runs out of memory between 50 and 75 components. This showcase example 
only serves to illustrate a near-to-optimal case for decoupled search reductions, which 
likely does not carry over in this extent to real-world models. 

In Fig. 11 (left part), we show detailed runtime behaviour in terms of scatter plots 
with a per-instance comparison on the random models. Each point corresponds to one 
instance, where the x-value is the runtime of SPIN, resp. Cunf, and the y-value is the 
runtime of DecNDFS, so points below the diagonal indicate an advantage of DecNDFS. 
Different ratios of internal labels (top row) and numbers of components (bottom row) 
are depicted in different colors/shapes. We observe that, as expected, with a higher ratio 
of internal transitions, the advantage of DecNDFS increases significantly. For all ratios, 
DecNDFS clearly improves with a higher number of components. 
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In Fig. 12, for the same benchmark set we show the number of solved instances 
as a function of the ratio (left) and of the number of components (right). Here, we 
see that from around 20% internal transitions, DecNDFS consistently beats both SPIN 
and Cunf. SPIN and Cunf also benefit from the decrease in synchronizing statements, 
although not as much as DecNDFS. On the right, we see that starting with 4 component 
NBAs (#A), DecNDFS consistently beats SPIN and Cunf. While SPIN and Cunf show a 
significant decline with more components, this effect is less pronounced for DecNDFS. 


7 Concluding Remarks and Future Work 


We have presented an approach to adapt decoupled search, an AI planning technique to 
mitigate the state-space explosion, to the verification of liveness properties of composed 
NBAs. Specifically, we have adapted a standard on-the-fly algorithm for checking w- 
regular properties, nested depth-first search (NDFS), and proven its correctness. The 
necessary adaptations essentially pertain to the conditions that identify the existence 
of accepting runs, which must be handled differently given the different properties of 
decoupled states. Our approach extends the scope of decoupled search from safety prop- 
erties, as done in [12], to liveness properties. Our experimental evaluation has shown 
that decoupled search can yield significant reductions in search effort across random 
models that consist of a set of synchronized NBAs, and simple scaling showcase exam- 
ples. 

We have focused on a verification problem for composed NBAs that is sufficiently 
general to cover significant cases like automata-based LTL model checking. We believe 
that our solution can be adapted to other verification problems for composed NBAs, 
including Biichi automata with multiple acceptance conditions such as generalized 
Biichi automata, and language intersection of the involved automata. Indeed, NDFS has 
successfully been used for emptiness checking of generalized NBA. We are confident 
that decoupled NDFS can be adapted to the compilation introduced by [33], where an 
additional “counter component” is added to keep track of the components that already 
have an accepting cyle during the nested DFS. Concretely, we believe that the verifi- 
cation problem of generalized NBA can be handled with adaptations by our approach: 
In the compilation by [33], the counter component increases its local state from 1 to n 
(assuming n components), one by one whenever component 2 has an accepting state. 
We can essentially apply the same compilation in decoupled NDFS, restricting the set of 
local states of A’ to the accepting ones when the counter is increased from i to i + 1 by 
a separate acceptance-split transition ie for each A;. This ensures that a global cycle 
includes an accepting state for all components. 

There are several interesting topics for future work, like the adaptation of opti- 
mizations proposed for basic NDFS (e.g. [22,32]), or the combination with orthogo- 
nal state space reduction methods, as previously done in the context of AI planning 
for partial-order reduction [16], symmetry reduction [18], and symbolic search [17]. 
Having focused on NDFS [5,22,32] in this work, we believe that the adaptation of 
SCC-based algorithms is a promising line of research [6,11], extending the scope of 
decoupled search further to model checking of CTL properties [24]. 
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Abstract. AIGEN is an open source tool for the generation of transi- 
tion systems in a symbolic representation. To ensure diversity, it employs 
a uniform random sampling over the space of all Boolean functions 
with a given number of variables. AIGEN relies on reduced ordered 
binary decision diagrams (ROBDDs) and canonical disjunctive normal 
form (CDNF) as canonical representations that allow us to enumerate 
Boolean functions, in the former case with an encoding that is inspired by 
data structures used to implement ROBDDs. Several parameters allow 
the user to restrict generation to Boolean functions or transition systems 
with certain properties, which are then output in AIGER format. We 
report on the use of AIGEN to generate random benchmark problems 
for the reactive synthesis competition SYNTCOMP 2019, and present 
a comparison of the two encodings with respect to time and memory 
efficiency in practice. 


1 Introduction 


Verification and synthesis algorithms require benchmark problems that can be 
used for testing and evaluation. Unfortunately, a diverse set of benchmarks is 
very hard to obtain. This is a problem not only for tool developers, but also for 
organizers of competitions [3,4,8, 11] that need to evaluate tools on a wide range 
of benchmarks, and to regularly search for new meaningful benchmarks. 

If done properly, the generation of random benchmarks can be a solution 
to this problem by providing the best possible diversity and by generating new 
benchmarks whenever needed. On the other hand, random benchmarks come 
with a few caveats. First of all, completely random generation is usually not 
desired, since it could result in many benchmarks that, while drawn from a 
diverse set, are not interesting, e.g., they may be too easy or too difficult to solve 
for existing tools. Secondly, users may be interested in how their implementation 
handles benchmarks with specific properties, for instance those that require long 
chains of computations to reach a conclusion. Finally, if users know how realistic 
benchmarks for a certain type of verification or synthesis problem usually look 
like, they may want to restrict the random generation to such benchmarks, e.g., 
by forcing them to comply with certain conditions on their structure. 


© The Author(s) 2021 
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In this paper we present AIGEN, a tool for random generation of transition 
systems in a symbolic representation. We generated transition systems with par- 
titioned transition relation, i.e., consisting of sets of Boolean functions. We ensure 
diversity at the level of individual Boolean functions by requiring a uniform ran- 
dom sampling over all Boolean functions with a given number of variables. 

While for some application areas there exist tools that generate random 
Boolean functions in a specific form (e.g. randomly generated propositional for- 
mulas in CNF [9,16]), to the best of our knowledge none of these supports 
uniformly random distributions. The obvious benefit of this approach is that 
random samplings allow to make statements about the actual space of Boolean 
functions, instead of statements about a specific representation of the functions, 
and these benefits extend to the random generation of transition systems. 

To ensure uniform random sampling, we rely on an enumeration of all Boolean 
functions with a given number of variables, based on their truth tables. From the 
truth tables one can generate in a straightforward way standard canonical repre- 
sentations of the functions, e.g., in canonical disjunctive normal form (CDNF) or 
canonical conjunctive normal form. As a more memory-efficient alternative, we 
developed an encoding that is inspired by data structures used for implementing 
reduced ordered binary decision diagrams (ROBDDs). 

AIGEN implements our ROBDD-based algorithm and a CDNF-based algo- 
rithm. Development of AIGEN was motivated by the evaluation of reactive syn- 
thesis tools [13], and it was used to generate benchmarks for the reactive syn- 
thesis competition (SYNTCOMP) [11,12]. Since the existing benchmark library 
of SYNTCOMP consists mostly of benchmarks that were hand-crafted by tool 
developers, the diversity of benchmarks is limited, and their choice may be 
skewed towards problems or encodings that are well-suited for the existing tools. 
Hence, as an addition to the existing hand-crafted examples, random benchmarks 
are a valuable source of insight into the performance of synthesis algorithms. 


Outline. We introduce BDDs and ROBDDs in Sect.2. In Sect. 3 we present 
our basic idea for the random generation of symbolic transition systems, based 
on enumerating Boolean functions. In Sect. 4, we present a detailed description 
of the ROBDD-based algorithm, and in Sect.5 the algorithm based on CDNF. 
Finally, in Sect. 6 we present a comparison between the ROBDD and the CDNF 
approaches, and we give details about our implementation and how to effectively 
use the tool to produce diverse benchmarks. 


2 Canonical Representation of Boolean Functions 


A Binary Decision Diagram (BDD) over a set of variables X is a directed acyclic 
graph G = (V, E) with V CN, exactly one root v, € V, and a labeling on nodes. 
Each terminal node v € V is labeled with a value val(v) € {0,1}. Each non- 
terminal node v € V is labeled with a variable var(v) € X and has exactly 
two outgoing edges, leading to nodes that are denoted by high(v) € V and 
low(v) € V, respectively. Note that if v € V is a non-terminal node, then the 
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directed acyclic graph rooted in v is also a BDD. It is called the sub-BDD of G 
with root v. 

A BDD G(V, E) over a set of variables X is ordered if on every path from 
the root to a terminal node, variables in node labels occur in the same order and 
each variable occurs at most once. A BDD is reduced if it does not contain any 
of the following: 


— non-terminal nodes v 4 w € V with var(v) = var(w), low(v) = low(w) and 
high(v) = high(w), 

— terminal nodes v 4 w € V with val(v) = val(w), 

— anon-terminal node v € V with low(v) = high(v). 


Any ordered BDD can be transformed into a reduced BDD by using the 
isomorphism and Shannon reductions (cp. [10]). A BDD that is reduced and 
ordered is called a Reduced Ordered Binary Decision Diagram (ROBDD). 

Note that in an ROBDD, a triple (x, high(v),low(v)) of a node v, where 
x = var(v), uniquely defines a sub-ROBDD. This implies that ROBDDs are a 
canonical representation of Boolean functions [10], i.e., for a fixed variable order 
there is a unique ROBDD representation for every Boolean function. 


3 Enumerating Boolean Functions 


Based on a canonical representation of Boolean functions, we define an enumer- 
ation, i.e., a bijective mapping from natural numbers to Boolean functions (or 
ROBDDs), such that any procedure that produces uniformly random natural 
numbers (in some range) can be used to produce uniformly random Boolean 
functions (in some range, see below for details). 

To define our mapping, we first describe the data structure for ROBDDs 
that is used by various BDD packages. Then we will illustrate the data structure 
we use for ROBDDs and how it guarantees canonicity and uniform random 
distribution. In the following, we assume that X = {21,...,@%m} is a set of 
variables with a fixed order. 


Unique Table. BDD packages use the so-called unique table as a data structure 
for storing ROBDD nodes. The unique table of a BDD G = (V, E) over a set of 
variables X is a hash table that establishes a bijection between nodes v € V and 
triples (xz, h, l) € X x V x V that uniquely identify them, where x = val(v) if v 
is a terminal node, and x = var(v) otherwise, h = high(v) and | = low(v). 


Virtual ROBDD Table. We will use the ideas from the unique table that is 
used in BDD packages to define the virtual ROBDD table that enumerates all 
possible ROBDDs with respect to our variable order. This table can of course 
not be constructed explicitly, but the idea of this table can be used to define 
a (bijective) mapping from natural numbers to ROBDDs. We want to generate 
random Boolean functions that are based on a uniform distribution. For this 
reason the algorithm generates randomly a natural number bddID < 2?” (since 
there are 2?” different Boolean functions of type B”™ — ), then computes a 
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unique triple similar to the one above that corresponds to bddID, and then 
iteratively builds the complete ROBDD. 

For the sake of illustrating how the algorithm computes the triple, assume 
that there exists a table, called Virtual ROBDD Table (or short: VRT), that 
maps natural numbers to ROBDDs, identified by a triple of variable index, and 
high and low children. In other words, every entry in the table maps uniquely a 
number bddID € N (i.e. a BDD node) to a triple (level, high, low) where level is 
a variable index, high = high(bddID), and low = low(bddID). Like the unique 
table, none of the entries (i.e., ROBDDs) appears twice. However, in contrast 
to the unique table, the VRT is based on the fixed variable order, and uses the 
variable index in this order instead of the variable itself. Table 1 depicts a sketch 
of the VRT. 


Table 1. VRT: Entries in the table are in ascending order over bddI D. Each row is 
annotated with a level and a sublevel. Li denotes the it” level, containing all triples 
with variable index i. The sublevel sli, denotes the gh sublevel of L; which contains 
all triples of L; in which j is the high or the low child, and the other child j’ is a 
bddID that belongs to a level Ly with i’ < i such that [j.j’] has not appeared before 
in L;. Each cell in a row annotated with L; and sl;, is of the form (bddID)[high.low] 


where bddID is the unique identifier of the triple (i, high, low). Let Yı = 227" and 
fod ay 

Yo = E A2 — m). 
m=1 


Lo (1) [0] (2) (4) 

Lı sli, (3) [1.2] (4) [2.1] 

Lə sla, (5) [1.2] (6) [2.1] (7)[1.3] | (8)[3.4] | (9) [1.4] (10) [4.1] 
slay (11) [2.3] (12)[3.2] | (13)[2.4] | (14)[4.2] 
slog (15) [3.4] (16) [4.3] 

Li sliy (Y1 +1) [1.2] (¥1 +2) [2.1] = (¥1+2(¥1 —1))[Y1-1] 
sli, (¥1+Y¥24+1)[9-5+1] ons (Y1 +Y2+2(Y1 —3)) [Y1 -3] 
Sliy, 1 


Note that a bddID between 1 and 2?” corresponds to a Boolean function with 
at most m input variables, and a bddI D between 2?” ‘+1 and 2?” corresponds 
to a function with exactly m input variables. Thus, to uniformly sample Boolean 
functions, we can use arandom number generator that uniformly samples natural 
numbers in such a range. 
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2: (16) [4.3] 
x ae (BLA 
0: Dio 29 


Fig. 1. BDD generated for number 16. Equivalent to boolean function: £2£1 + 21. 
The numbers on the left of the BDD represent the level i.e. corresponding variable 
indices. 


It is important to remember that the VRT is not constructed explicitly. 
Instead, given a number of variables m, and based on the predefined order- 
ing of ROBDD in the VRT (2?” ROBDDs), the algorithm generates first a 
random number bddID < 2?”, then computes the triple (level, high,low) to 
which bddID maps. We note: level (or i) is equal to [log2(log2(bddI D))|. Let 
Yı = 2?’ *, then we solve the following system of equations to compute x which 
is cee to the sublevel: 

Yı + 1) +... +2(Yı — G +1)) > bddID 

High and lone are then computed according to what is given in the table, 
see Sect. 4 for more details. Figure 1 shows the BDD generated for bddI D = 16 
which is equivalent to: £2£1 + £221. 


4 Random Generation of (Controllable) Transition 
Systems 


In this section we present our algorithm for generating random transition sys- 
tems, represented as AIGER circuits [5]. We use a generalization of the usual 
notion of transition systems that allows some of the input signals to be declared 
as controllable. This is useful to define synthesis problems, i.e., a synthesis pro- 
cedure can define how these inputs should behave depending on the state and 
uncontrollable inputs of the system. 

A controllable transition system (or short: controllable system) T'S is a 6- 
tuple (L, Xu, Xe, F,BAD,qo), where L is a set of state variables (also called 
latches), Xu is a set of uncontrollable input variables, Xe is a set of controllable 
input variables, F = (fi,..., fiz) with fi : BY x B*« x Be — B is a vector of 
update functions for the latches, BAD : B4 — B is the set of unsafe states, and 
qo is the initial state where all latches are initialized to 0. 

Then, the idea of our tool for random generation of transition systems can 
be summarized in the following way: 


— The user input determines parameters of the system, such as the number of 
latches and controllable or uncontrollable inputs. 
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— For every latch, we generate a random Boolean function that determines how 
this latch is updated based on the current state and input of the system, 
represented as ROBDD as described in Sect. 3. 

— Additionally, we generate a random Boolean function that determines the set 
of unsafe states of the system. 

— The system composed of these functions is then encoded into an AIGER 
circuit. 


4.1 Random Generation Algorithm 


The procedure GENERATERANDOMAIGER takes as input the number of latches 
l, uncontrollable inputs u, controllable inputs c, the bound o, optionally a list of 
seeds (i.e., natural numbers used to initialize a pseudorandom number genera- 
tor). As output it produces a file in AIGER format. 

Lines 3-6 generate for every latch a random ROBDD that represents an 
update function B'+¢+” — B for the latch, i.e., a function that takes all current 
values of inputs and latches as input, and returns a new value for the given 
latch. Line 4 generates a random integer with 2°°"* random bits, i.e., a natural 
number between 1 and 27°". All the seeds used for generating the random 
integers will be written in the comment section at the end of the generated file. 
These seeds can be fed to the algorithm in order to regenerate the same instance. 
Line 5 constructs the ROBDD that corresponds to the generated number. Line 
6 converts the constructed ROBDD into an AIG (And-Inverter Graph) relying 
on the fact that a BDD can be seen as a network of multiplexers. 

Lines 8-10 construct the ROBDD of the function fgap : B° — B which 

uses o < l latch variables. The set of unsafe states BAD is then defined as 
F(t- 2g) A Njega, Dfi, i} 23 Where the indices {i1,...,i.} are also 
picked a Line 11 creates the AIGER file that corresponds to the total 
number of variables and to the update functions that were randomly generated. 
Line 12 uses the ABC [7] tool to reduce the size of the generated AIGER file. 

CONSTRUCTBDD is a recursive procedure for constructing all the nodes of 
the ROBDD that corresponds to the unique ID bddID. It starts with the root 
node and recursively proceeds to the child nodes until it reaches the nodes 0 or 
1. Line 14 checks if the node was already created. If not, Line 15 computes the 
triple (level, high, low) that uniquely represent the node and adds it to the table 
robddT able. Lines 18-17 construct the child nodes. Note that the robddT able is 
initialized with the IDs 1 and 2 which correspond respectively to nodes 0 and 1. 

Given an ID, procedure GETCHILDREN computes the triple (level, high, low). 
Line 20 computes the level. Lines 21—24 compute the sublevel. Note that, as 
depicted in Table 1, a sub-level s;, has size 2(2? — j), where 2? is the sum 
of the sizes of all levels that are ‘smaller than i. To compute the sublevel, we 
have to compute the single solution of the system of inequations in Lines 22, 
23, to see that check the VRT table. Line 25 computes the ID of the left-most 
bit in the sub-level. Lines 26-27 compute the ID of the second child node, and 
Lines 28-30 check which node is the low edge and which node is the high edge. 
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Algorithm 1. Generate Random Aiger 
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1: procedure GENERATERANDOMAIGER(I, u, c, 0) 


vars — l +u +c,l' =1,robddTable = [(1,0), (2, 1)] 
while l’ > 0 do 
rand_fct_ID = random.getrandbits(2°°"*) + 1 
ConsTRUCTBDD (rand_fct_ID, robddT able) 


aigerT able[rand_f ct_I D| =CONVERTTOAIG(rand_fct_I D, robddT able) 


lel- 
bad_ID = random.getrandbits(2°) + 1 
CONSTRUCTBDD (bad ID, robddT able) 
aigerT able|bad_I D| =CONVERTTOAIG(bad_ID, robddT able) 
aiger FilePath —CREATEAIGER(aigerT able) 
ABCMINIMIZE(aiger FilePath) 


: procedure CONSTRUCTBDD(bddID, robddT able) 


if bddID ¢ robddTable then 
(level, high, low) — GETCHILDREN(bddI D) 
robddTable|bddI D] — (level, high, low) 
CONSTRUCTBDD (high) 
CONSTRUCTBDD (low) 


: procedure GETCHILDREN(bddID) 


level = [log2(log2(bddI D))] 

ne galsvel—i 

sli —COMPUTEASOL(n + 2(n — 1) +... + 2(n — x) < bddI D, 
n+2(n—1)+...+2(n— (x+ 1)) > bddID) 


child, — sli + 1 
sl1ID —n+2(n—1)+...+2(n-— sli) 
sle — bddID — sl_1_ID 
childz — child, + [sle/2] 
if sle mod 2 Æ 0 then 
return (level, childı, child2) 


return (level, child2, child) 


5 CDNF-based Algorithm 


An obvious alternative to our ROBDD approach is to make use of the canonical 
disjunctive or conjunctive normal forms to generate random Boolean functions. 
Algorithm 2 employs CDNF as it is easier to convert to And-Inverter graph. 
CDNF is usually constructed directly from a truth table by taking the OR of all 
satisfying assignments. To convert a Boolean formula f; = cli V clə V...V cln in 


CDNF to AIG, we consider its equivalent ff = —(>cly Ancel A... A acl). 
The procedure DNFGENERATERANDOMAIGER takes as input the number 


of latches l, uncontrollable inputs u, controllable inputs c, the bound o, and 
produces a file in AIGER format as output. Lines 3-6 generate a random update 
function for every latch. Line 4 generates a random bit vector of size 2°". 
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Algorithm 2. Random Aiger generation using DNF approach 


1: procedure DNFGENERATERANDOMAIGER(L, u, c, 0) 
2: vars —l+u +c, l’ — 0 


3 while I’ < l do 

4: truthTable = random.getrandbits(2°°"*) 

5: dnf Formula = CONSTRUCTDNF (truthT able, vars) 
6: aigerTable[l’] = CONVERTTOAIG(dnf Formula) 

7 VeAl+i1 

8 badTruthTable = random.getrandbits(2°) 


9: badDn f Formula = CONSTRUCTDNF (truthT able, o) 
10: aigerTable{l’] = CONVERTTOAIG(badDnf Formula) 
11: aiger FilePath — CREATEAIGER(aigerTable) 

12: ABCMINIMIZzE(aiger FilePath) 


13: procedure CONSTRUCTDNF(bitVec, vars) 


14: dnf Formula — True, i — 0 

15: while i < bitVec.size() do 

16: if bitVec{i] = 1 then 

17: clauseBitvec —TOBINARY (i, vars) 

18: dn f Clause —TOCLAUSE(clause Bitvec) 

19: dnf Formula — dnf Formula ^ negate(dnf Clause) 
20: return negate(dnf Formula) 


This bit vector represents the valuation of all the minterms! of the truth table 
that represents the random function f;. For instance, if the left-most bit of the 
bit vector is equal to 1, then £e = 0,- -3 ej. = O,8ug = 0,.--, Pur 

0, Zi = 0,...,%y,,_, = 0 is a satisfying assignment of fi. Similarly, if the last 
element of the bit vector is equal to 1, then ze = 1,... Leja = l; Zuo = 
1,..., Zua = l, Zo =1,..-,2y,_, = 1 is a satisfying assignment of fi. Line 
5 builds the random function that corresponds to the generated bit vector, and 
Line 6 converts it to AIG. Lines 8-10 generate the output random function, and 
Lines 11, 12 creates the AIGER file and call ABC to minimize it. 

The procedure CONSTRUCTDNF takes as input a bit vector and the number 
of variables and generates the corresponding Boolean function. Line 14 initializes 
the DNF function to be created. For every element in the bit vector, if the 
ith element is equal to 1 (Line 15) then, in order to obtain the corresponding 
minterm, Line 17 converts the positive integer i to binary. For instance if i = 
3 and vars = 3, then the minterm £e A 77, A 2 is created. Line 18 creates 
the corresponding minterm. Line 19 negates the created clause and adds is to 
the DNF formula. Line 20 returns the negation of the constructed formula. As 
mentioned earlier, as the formula represented by the truth table is in DNF, we 
need to generate its equivalent that includes only AND and NOT logical gates. 
For instance giving a formula f; = cl, VclgV...V cl, in CDNF, we construct its 
equivalent f! = a(7cly Anela A... A aly). 


1 A minterm of n variables is a product (logical AND) of the variables in which each 
appears exactly once in uncomplemented or complemented form. 
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6 Implementation and Evaluation 


AIGEN is implemented in Python, and a virtual machine with the tool ready to 
run is available at https://doi.org/10.5281/zenodo.4721314 [14]. The source code 
of AIGEN is also publicly available at https://github.com/mhdsakr/AIGEN- 
Tool, allowing interested users to add functionality, e.g., in order to add further 
parameters to generate only Boolean functions or transition systems with certain 
properties. It uses the mpmath [15] library together with GMPY [1] to deal with 
large numbers. By default, mpmath uses Python integers, however if GMPY is 
also installed on the operating system, mpmath will automatically detect it and 
use gmpy integers intead. This makes mpmath perform much faster, particularly 
at high precision (approximately above 100 digits). Furthermore, AIGEN uses 
ABC [7], and the AIGER tool set [6] to post-process AIGER circuits. 

AIGEN has been used to generate thousands of random transition systems. 
Figures4, 2, 3 shows average times and sizes for generating systems where, 
for example, 4.3.7 denotes systems with 4 controllable inputs, 3 uncontrollable 
inputs, and 7 latches (o = l = 7). These times were measured on a laptop with 
quad-core i7-6600U CPU at 2.6GHz and 20 GB RAM. 


444 S. Jacobs and M. Sakr 


Figures 4 and 2 compare average running time and average number of AND- 
gates between the ROBDD and DNF approaches. These results are without 
the use of the ABC tool (i.e. the command “ABCMinimize(aigerFilePath)” was 
skipped). Figure 4 shows that the DNF approach was faster in all cases which was 
expected due to the fact that generating a random ROBDD is much more com- 
plex than generating a truth table. Figure2 shows that the ROBDD approach 
is much better in all cases. Figure 3 compares average running time between the 
ROBDD and DNF approaches, including the time needed for the ABC tool to 
minimize the generated transition system. Benchmarks 8.8.4, 9.9.4, and 10.11.2 
timed out for the DNF approach(we used 10h as a time limit). Obviously the 
ABC tool needed a lot of time to process these benchmarks. After a thorough 
inspection, the reason was, in addition to the huge size of these circuits, the 
incredibly long chains of AND-gates for every generated Boolean function. This 
figure shows that the total running time of the tool was way better when used 
with the ROBDD approach. 


The Effect of Parameters. Although the benchmarks are randomly generated, 
AIGEN allows the user to choose the input parameters to obtain benchmarks 
with certain properties that correspond to their needs, for example: 


— The degree of the generated graph (i.e., the transition system) is equal to 2“*°, 
therefore increasing the ratio (u + c)/l will make the graph more congested 
and consequently more complex. 

— The parameter o gives the user the ability to determine the size of the set of 
unsafe states, i.e., the number of unsafe states cannot exceed 2°. Accordingly, 
increasing the ratio o/l will increase the probability that the error set is 
reachable, and decreasing this ratio will lower the probability. 

— Increasing the ratio c/u will increase the probability that the benchmark is 
realizable, and decreasing it will serve the opposite goal. Moreover, if this 
ratio is close to 1 the realizability check will be harder, since the probability 
of realizability will be roughly equal to the probability of unrealizability. 


To demonstrate the effect of these parameters, Table 2 shows the running 
time and results (realizable or unrealizable) of the synthesis tool SimpleBDD- 
Solver on selected benchmarks, generated using the ROBDD-based approach, in 
SYNTCOmMP 2019. SimpleBDDSolver has won all previous iterations of the Synt- 
comp competition. A benchmark name contains the parameters that were used 
to generate the file, e.g., random_n_19_1_8_15_14_1_abc means that the bench- 
mark has in total 19 variables with 1 controllable input, 3 uncontrollable inputs, 
15 latches, and o = 14. The table shows that the example benchmarks with 
ratio c/u = 1/3 or c/u = 1/5 were unrealizable, the benchmarks with ratio 
c/u = 2 were realizable, while benchmarks with ratio c/u = 1/2 were difficult 
to solve for the tool, which timed out while trying to solve them. Note that a 
benchmark with c/u = 1/5 can still be realizable, and one with c/u = 2 can be 
unrealizable—it is just unlikely that this is the case for a randomly generated 
benchmark. 
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Table 2. Results of SimpleBDDSolver on selected random benchmarks generated by 
AIGEN in SynTComp 2019 [2] 


Benchmark Time (s) | Result 
random_n_19_1_3_15_14 1_abe | 3412.41 | UNREALIZABLE 
random_n_19_1_5_13_13_2_abe | 1361.39 | UNREALIZABLE 
random_n_19_1_2_16_14_8_abc | Timeout |— 
random_n_19_1_4_14_13_11_abc | Timeout | — 
random_n_19_4_2_13_12_11_abc | 43.68 REALIZABLE 
random_n_19_4_2_13_12_12_abc | 35.71 REALIZABLE 
random_n_19_4 2.13.12.3_abe | 240.61 REALIZABLE 
random_n_19_4_2_13_12_62_abc | 299.5 REALIZABLE 
random_n_19_4_2_13_12_95_abc | 258.92 | REALIZABLE 


7 Conclusion 


We have presented AIGEN, a tool for the generation of random transition sys- 
tems in a symbolic representation, using either ROBDDs or CDNF for represent- 
ing Boolean functions. Although the ROBDD based approach generates much 
smaller symbolic transition systems, the CDNF approach is faster when ABC 
minimization procedure is disabled. In contrast to the ROBDD approach, to gen- 
erate a random formula in CDNF, no complex computation is needed. However, 
when using minimization, the huge size of these formulas becomes a problem for 
ABC as it has to deal and inspect all the generated AND-gates. 

In future work, instead of using a fixed variable order, we will also allow 
to use a random order. The drawback of a fixed order is that some Boolean 
functions only have a large ROBDD representation, even though smaller ones 
exist with different orderings, and vice versa. Going further, we plan to include 
variable reorder techniques to find an order that leads to small ROBDDs at 
runtime. Finally, we also plan to investigate the use of AIGEN for finding bugs 
in verification and synthesis tools. 
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Abstract. The effective parallelisation of Bounded Model Checking is 
challenging, due to SAT and SMT solving being hard to parallelise. We 
present PARAFROST, which is the first tool to employ a graphics proces- 
sor to accelerate BMC, in particular the simplification of SAT formulas 
before and repeatedly during the solving, known as pre- and inprocessing. 
The solving itself is performed by a single CPU thread. We explain the 
design of the tool, the data structures, and the memory management, the 
latter having been particularly designed to handle SAT formulas typically 
generated for BMC, i.e., that are large, with many redundant variables. 
Furthermore, the solver can make multiple decisions simultaneously. We 
discuss experimental results, having applied PARAFROST on programs 
from the Core C99 package of Amazon Web Services. 


Keywords: Bounded model checking - SAT solving - GPU computing 


1 Introduction 


Bounded Model Checking (BMC) [5] determines whether a model M satisfies a 
certain property y expressed in temporal logic, by translating the model check- 
ing problem to a propositional satisfiability (SAT) problem or a Satisfiability 
Modulo Theories (SMT) problem. The term bounded refers to the fact that the 
BMC procedure searches for a counterexample to the property, i.e., an execution 
trace, which is bounded in length by an integer k. If no counterexample up to this 
length exists, k can be increased and BMC can be applied again. This process 
can continue until a counterexample has been found, a user-defined threshold has 
been reached, or it can be concluded (via k-induction [38]) that increasing k fur- 
ther will not result in finding a counterexample. CBMC [14] is an example of a 
successful BMC model checker that uses SAT solving. CBMC can check ANSI-C 
programs. The verification is performed by unwinding the loops in the program 
under verification a finite number of times, and checking whether the bounded 


M. Osama—This work is part of the GEARS project with project number 
TOP2.16.044, which is (partly) financed by the Netherlands Organisation for Scientific 
Research (NWO). 

© The Author(s) 2021 


A. Silva and K. R. M. Leino (Eds.): CAV 2021, LNCS 12760, pp. 447—460, 2021. 
https://doi.org/10.1007/978-3-030-81688-9_21 


448 M. Osama and A. Wijs 


100%) 100%) 


w% 
S 
ba 


80%] 


a 
3 
x 


r 
è 
x 

Reduction efficiency 


Reduction efficiency 


n 
8 
Ea 


0 20 40 [o 120 140 160 oO 2 12 14 


60 80 101 4 6 8 10 
# CBMC Formulas # Original Variables (in millions) 


(a) The amount of variable redundancy in CBMC formulas (b) The amount of variable redundancy w.r.t. the number of vari- 
ables 


Fig. 1. Variable redundancy in CBMC SAT formulas 


executions of the program satisfy a particular safety property [22]. These prop- 
erties may address common program errors, such as null-pointer exceptions and 
array out-of-bound accesses, and user-provided assertions. 

The performance of BMC heavily relies on the performance of the solver. 
Over the last decade, efficient SAT solvers [3,6,17,26] have been developed and 
applied for BMC [5,10-12,25]. Effectively parallelising BMC is hard. Parallel 
SAT solving often involves running several solvers, each solving the problem in 
its own way [18]. For BMC, multiple solvers can be used to solve the problem for 
different values of the bound k in parallel [1,21]. However, in these approaches, 
the individual solvers are still single-threaded. 

Recently, Leiserson et al. [23] concluded that in the future, advances in 
computational performance will come from many-threaded algorithms that can 
employ hardware with a massive number of processors. Graphics processors 
(GPUs) are an example of such hardware. Multi-threaded BMC model checkers 
have been proposed, such as in [13,19,35], but these address tens of threads, not 
thousands. 

In this paper, we propose the application of GPUs to accelerate SAT-based 
BMC. To the best of our knowledge, this is the first time this is being addressed. 
Recently, GPUs have been applied for explicit-state model checking and graph 
analysis [8,9,40,41]. In SAT solving, we used GPUs to accelerate test pattern 
generation [31], metaheuristic search [42], preprocessing [32,33] and inprocess- 
ing [34]. In these operations, a given SAT formula is simplified, i.e., it is rewritten 
to a formula with fewer variables and/or clauses, while preserving satisfiability, 
using various simplification rules. In preprocessing, this is only done once before 
the solving starts, while in inprocessing, this is done periodically during the 
solving. While the impact of accelerating these procedures has been demon- 
strated [34], its impact on BMC has not yet been addressed. 

The structure of typical BMC SAT formulas suggests that GPU pre- and 
inprocessing will be effective. Figure la shows for a BMC benchmark set taken 
from the Core C99 package of Amazon Web Services (AWS)! [2], consisting of 
168 problems of various data structures, that propositional formulas produced 
by CBMC tend to have a substantial amount of redundant variables that can 


1 We thank Daniel Kroening and Natasha Jebbo for pointing us to this package. 
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be removed using simplification procedures. For approximately 50% of the cases, 
40% of the variables can be removed. Furthermore, Fig. 1b presents the amount 
of redundancy in relation to the total number of variables in the formula. It 
indicates that when a formula contains one million variables or more, at least 
25% of those are redundant, and often many more. In the benchmark set, the 
maximum number of variables in one formula is 13 million (encoding the verifi- 
cation of the priority-queue shift-down routine), of which 65% is redundant. 
In contrast, the largest formula we encountered in the application track of the 
2013-2020 SAT competitions that is not encoding a verification problem only 
has 0.2 million variables (it encodes a graph coloring problem [29]). 


Contributions. We present the SAT solver PARAFROST that applies Con- 
flict Driven Clause Learning (CDCL) [26] with GPU acceleration of pre- and 
inprocessing [32-34], tuned for BMC. It has been implemented in CUDA C++ 
v11 [28], is based on CADICAL [6], and interfaces with CBMC. 

Having to deal on a GPU with large formulas with a lot of redundancy offers 
particular challenges. The elimination of variables typically leads to actually 
adding new clauses, and since the amount of memory on a GPU is limited, this 
cannot be done carelessly. Therefore, first of all, we have worked on compacting 
the data structure used to store formula clauses in PARAFROST as much as 
possible, while still allowing for the application of effective solving optimisations. 
Second of all, we introduce memory-aware variable elimination, to avoid running 
out of memory due to adding too many new clauses. In practice, we experienced 
this problem when applying the original procedure of [34] for BMC. 

Additionally, to support BMC, PARAFROST must be an incremental solver, 
i.e., it must exploit that a number of very similar SAT problems are solved in 
sequence [16]. The procedure in [34] does not support this, so we extended it. 

Finally, because of the many variables in BMC SAT formulas, PARRAFROST 
supports Multiple Decision Making (MDM) in the solving procedure, as pre- 
sented in [30]. With MDM, multiple decisions can be made at once, periodically 
during the solving. In case there are many variables, there is more potential 
to make many decisions simultaneously. We have generalised the original MDM 
decision procedure [30], making it easier to integrate MDM in solvers other than 
MINISAT and GLUCOSE [3]. The effectiveness of MDM in BMC has never been 
investigated before, nor has been combined with GPU pre- and inprocessing. 


2 Background 


SAT Solving. We assume that SAT formulas are in conjunctive normal form 
(CNF). A CNF formula is a conjunction of m clauses C1 A- A Cm, and each 
clause C; is a disjunction of n literals 41 V---V2@,. A literal is a Boolean variable x 
or its negation 72, also referred to as z. The domain of all literals is L. A clause 
can be interpreted as a set of literals, i.e., {€1,...,€n} encodes 41 V.. .V Zn, anda 
SAT formula S as a set of clauses, i.e., {C1,...,Cm} encodes Ci A...ACm. With 
Var(C), we refer to the set of variables in C: Var(C) = {a |xe CVZ€C}. 
The set Se consists of all clauses in S containing £: Se = {CE S| LE C}. 
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In CDCL, clauses are LEARNT or ORIGINAL. A LEARNT clause has been derived 
by the CDCL clause learning process during solving, and an ORIGINAL clause is 
part of the formula. We refer with £ to the set of LEARNT clauses. 

For a set of assignments »’, consisting of all literals that have been assigned 
true, a formula S evaluates to true iff VC € S. € C.l € X. When a decision 
is made, a literal is picked and added to X. Each assignment is associated with a 
decision level (time stamp) to monitor the assignment order. We call a clause C 
unit iff a single literal in it is still unassigned, and the others are assigned false, 
i.e., | Var(C) \ Var( X)| = 1 and CNY = Í. 


Variable-Clause Elimination (VCE). Variables and clauses can be removed 
from formulas by applying simplification rules [15,20]. They rewrite a formula 
to an equi-satisfiable one with fewer variables and/or clauses. Applying them is 
referred to as pre- and inprocessing, before and during the solving, respectively. 


Incremental Bounded Model Checking. Since 2001, incremental BMC has 
been applied to hardware and software verification [16,39]. It relies on incre- 
mental SAT solving [16]. In CDCL, clauses are learnt during the solving each 
time a wrong decision has been made, to avoid making those decisions again in 
the future. Incremental SAT solving builds on this: when multiple SAT formulas 
with similar characteristics are solved sequentially, then in each iteration, the 
clauses learnt in previous iterations are reused. An efficient approach to add and 
remove clauses is by using assumptions [16], which are initial assignments. 

For BMC, the transition relation of a system design and the (negation of) the 
property to be verified are encoded in a SAT formula. A predicate Z (so) identifies 
the initial states, 6(s;,5;+1) encodes the transition relation at trace depth i, and 
E(t) = Vo<j<; e($;) encodes the reachability of an error state up to trace depth i, 
where e(s;) is true iff state sj is an error state. For incremental BMC, additional 
unit clauses g; are used. These predicates are combined to define the following 
series of SAT formulas S(i) that must be solved incrementally: 


S(0) = Z(so) A (E(0) V oo), under assumption 709 
S(i+1) = S(t) A 6(5;, 841) Agi A (Eli +1) V 0441), under assumption 70;41 


Formula S(¢) is satisfiable iff an error state is reachable via a trace with a length 
up to i [16,39]. At iteration i + 1, we know that €(2), included via S(t), cannot 
be satisfied (otherwise iteration i+ 1 would not have been started). This means 
that (2) must be removed to avoid that S(i + 1) is unsatisfiable. To effectively 
remove €(2), g; is assigned true, resulting in E(i) V o; being satisfied. In general, 
at iteration i, g; is assigned false, while in iterations 7’ > i, it is assigned true. 


GPU Programming. CUDA [28] is NVIDIA’s parallel computing platform 
that can be used to develop general purpose GPU programs. A GPU consists of 
multiple streaming multiprocessors (SMs), and each SM contains several stream- 
ing processors (SPs). A GPU program consists of a host part, executed on a 
CPU, and device functions, or kernels, executed on a GPU. Each time a kernel 
is launched, the number of threads that need to execute it is given. On the SPs, 
the threads are executed. Compared to a CPU thread, GPU threads perform 
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Fig. 2. An activity diagram for the workflow of PARAFROST. 


a relatively simple task. In particular, they read some data, perform a com- 
putation, and write the result. This allows the SPs to switch contexts easily. 
In practice, one to two orders of magnitude more threads are typically launched 
than the number of SPs, which results in hiding the memory latency: whenever a 
thread is waiting for some data, the associated SP can switch to another thread. 
A GPU has various types of memory. Relevant here are registers and global 
memory. Global memory is used to copy data between the host and the device. 
Registers are used for on-chip storage of thread-local data. Global memory has a 
much higher latency than registers. We use unified memory [28] to store clauses. 
Unified memory creates one virtual memory pool for host and device. In this way, 
the same memory addresses can be used by the host and the device, combining 
the main memory of the host side and the global memory of the device side. 


3 GPU-Accelerated Bounded Model Checking 


We implemented PARAFROST? with CUDA C++ v11. It is a hybrid CPU-GPU 
tool, with (sequential) solving done on the host side, and (parallel) VCE done 
on the device side. An interface with CBMC is implemented in C++. CBMC is 
patched to read a configuration file before PARAFROST is instantiated. This 
file contains all options supported by PARAFROST. 


The Workflow. Figure 2 presents the general workflow of PARAFROST in 
the form of an activity diagram with host and device lanes. The diagram is 
focused on inprocessing; preprocessing works similarly on the device. First, the 
host performs a predetermined number of solving iterations. Once those have 
finished, and (un)satisfiability has not yet been proven, relevant clause data is 
copied to the global memory. To hide the latency of this operation as much as 
possible, clauses are copied asynchronously in batches. One batch is copied while 
the next is formatted for the GPU, as not all clause information on the host side 
is relevant for the device (see the next paragraph on data structures). On the 
device, signatures are computed for fast clause comparison, and the clauses are 
sorted for VCE (more on VCE later). Next, the device constructs a histogram, 
for fast lookup of clauses, and sorts the variables. The THRUST library is used 


? The tool is available at https://gears.win.tue.nl/software/gpu4bmce. 
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for sorting.’ After that, the host schedules variables for VCE, marking those 
variables in the global memory using unified memory. Next, the device applies 
VCE, marking clauses to be removed as DELETED. The host propagates units 
(literals in unit clauses are assigned true), which directly has an effect on the 
formula in the global memory. The VCE procedure is repeated until it has been 
performed a predetermined number of times. After each time, DELETED clauses 
are removed, and after the last iteration, this is done while the new clauses are 
copied to the host. Once this has been done, the overall procedure is repeated. 


Data Structures and Memory Management. We have worked on making 
the storage of each clause in the GPU global memory as efficient as possible. 
However, we also wanted to annotate each clause with sufficient information for 
effective optimisations. In PARAFROST, the following information is stored for 
each clause: 


— The state field (2 bits) stores if the state is ORIGINAL, LEARNT or DELETED. 

— The used field (2 bits) keeps track of how many search iterations a LEARNT 
clause can still be used. LEARNT clauses are used at most twice [6]. 

— Two fields (1 bit each) are used for VCE bookmarking. 

— The literal block distance (1bd) (26 bits) stores the number of decision levels 
contributing to a conflict, if there is one [3]. A maximum value of 27° turns 
out to be sufficient. This field is updated when the clause is altered. 

— The size (82 bits) of the clause, i.e., the number of literals. 

— A signature sig (32 bits) is a clause hash, for fast clause comparison [15]. 


In addition, a list of literals is stored, each literal taking 32 bits (1 bit to 
indicate whether it is negated or not, and 31 bits to identify the variable). In 
total, a clause requires 12 + 4t bytes, with t the number of literals in the clause. 
For comparison, MINISAT only requires 4 + 4t bytes, but it does not involve the 
used, lbd and sig fields, thereby not supporting the associated optimisations. 
CADICAL [6] uses 28 + 4t bytes, since it applies solving and VCE on the same 
structures. In PARAFROST, the GPU is only used for VCE, in which infor- 
mation for probing [24] and vivification [36], for instance, is irrelevant. Finally, 
in [34], 20 + 4t bytes are used, storing the same information as PARAFROST. 

To store a formula S, a clause array is preallocated in the global memory, 
and filled with the clauses of S. More space is allocated than the size of S, to 
allow the addition of clauses that result from VCE. As the amount of allocated 
space is the limiting factor for the addition of new clauses, we have developed a 
memory-aware VCE mechanism, which we explain later in the current section. 


Parallel VCE. PARAFROST supports the VCE rules substitution (i.e., gate 
equivalence reasoning), resolution (RES), subsumption elimination (SUB) and 
eager redundancy elimination (ERE) [15,20]. Substitution applies to patterns 
representing logical gates, and substitutes the involved variables with their gate 
definitions. PARAFROST supports AND/OR, Inverter, If Then Else and XOR. 


3 https: //docs.nvidia.com/cuda/thrust. 
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RES: z U C1, 7 U C2 => Cı U C2 (xU Cı Z LATUCə2¢ L) 
SUB1: x U Cy, U C2,7 U C2 => Ci UC2,£U C2 
SUB2: Ci U C2, C2 => C2 (C2 EL = L'=CL\ {02} 


ERE: æ U C1, g U C2, C1 U C2 > zU C1, gU C2 ({x U C1, ZU C2} NL AD = C1UC2€ L) 


Fig. 3. VCE rules in PARAFROST. Cı and C2 are non-empty sets of literals. 


In Fig. 3, we provide rewrite rules for SUB and RES. If clauses exist in S of 
the form expressed by the left hand side of a rule, then the rule is applicable, 
and the involved clauses are replaced by the clauses (called resolvents) on the 
right hand side. RES is applicable if there are two clauses of the form zU Cı and 
zU C2, and applying it results in replacing those with a clause C1 U C2. SUB 
consists of two rules; the second is applied once the first is no longer applicable. 

Conditions are given between parentheses. For RES, only ORIGINAL clauses 
are considered. Besides that, if C1 U C2 evaluates to true, it is actually not 
created. As LEARNT clauses are sometimes deleted during solving, SUB2 should 
only produce ORIGINAL clauses; if C2 is LEARNT before applying the rule, it will 
become ORIGINAL (L’ refers to the set of LEARNT clauses after application). For 
ERE, LEARNT clauses cannot cause the deletion of an ORIGINAL clause. 

VCE is applied in parallel by PARAFROST by scheduling sets of mutually- 
independent variables for analysis. Two variables x and y are independent in S 
iff S does not contain a clause containing literals that refer to both variables, 
i.e., Sy U Sz and Sy U Sg are disjoint. This ensures that two threads focussing 
on x and y, respectively, does not lead to data races. In incremental solving, 
variables referred to by assumptions must be excluded from VCE. In each VCE 
iteration, a different set W of variables is selected. This is achieved by using 
an upper-bound p for the number of occurrences of a variable in S. After each 
iteration, js is increased, allowing the selection of more variables. PARAFROST 
supports configuring u and the number of VCE iterations. 

As already mentioned, clauses that can be removed are marked DELETED 
before they are removed. The removal of clauses is done once VCE has finished 
(see Fig. 2) to avoid data races. However, because of this, VCE may at first 
require more memory to store clauses. The clauses added during VCE must fit 
in the memory, otherwise the procedure fails. To ensure this, we have developed 
a memory-aware mechanism for VCE. Next, we explain this mechanism for the 
RES rule and substitution, as the application of those rules results in new clauses. 

Algorithm 1 presents how RES and substitution are applied in PARAFROST. 
It requires S, stored in a clause array clauses. As clauses are of varying sizes, we 
need an array references that provides a reference to each clause. In addition, 
arrays varinfo, cindex and rindex are given, which are filled in the first lines. 

At line 1, the kernel VCESCAN is called in which a different thread is assigned 
to each variable x € W. Each thread checks the applicability of VCE rules 
on its variable and computes the number of clauses and literals that will be 
produced by the first applicable rule. A thread with ID i stores the type 7 of the 
applicable rule (NONE, RESOLVE, or SUBSTITUTE) and the number of clauses 3 and 
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Algorithm 1: Parallel memory-aware application of RES and substitution 


Input : global W, clauses, references, varinfo, cindex, rindex 


1 varinfo — VCESCAN(Y, S) 
2 cindex  COMPUTECLAUSEINDICES(varinfo, SIZE(clauses) ) 
3 rindex — COMPUTECLAUSEREFINDICES(varinfo, SIZE(references) ) 
4 VCEAPPLY (Ý, clauses, references, varinfo, cindex, rindex) 
5 kernel VcEAPPLY(W, clauses, references, varinfo, cindex, rindex): 
6 for all ¿ € [0,|W|) do in parallel 
7 register cidxz — cindex[i], ridx = rindex|[i| 
8 register 7, 3, y +— varinfo[i 
9 if T = RESOLVE ^ MEMORYSAFE(ridz, cidz, 3, y) then 
10 RESAPPLY (clauses, references, x, ridx, cidz) 
11 if + = SUBSTITUTE ^ MEMORYSAFE(ridaz, cidz, 8, y) then 
12 SuBAPPLy (clauses, references, x, ridx, cidz) 
13 device function MEMORYSAFE(ridz, cidz, B, y): 
14 reqSpace — cidr +12 x B + (4-74) // required number of bytes 
15 if reqSpace > CAPACITY(cLauSES) then return false 
16 numRefs — ridx + 8 // required number of clause references 
17 if numRefs > CAPACITY(REFERENCES) then return false 
18 return true 


literals y produced by that rule in one integer at varinfo/i]. At lines 2-3, kernels 
COMPUTECLAUSEINDICES and COMPUTECLAUSEREFINDICES are called to add 
up the ĝ’s and 7’s to obtain offsets into the arrays references and clauses 
(the method sIzE(A) refers to the amount of data in array A). Both methods 
apply a parallel exclusive prefix sum [37], involving the 8’s and y’s. The result 
is that thread i, assigned to x, is instructed to start writing clause references 
at references|rindex|i]] and clauses at clauses|cindex|i]] when applying the 
next VCE rule for x. Whether the data actually fits is checked later. 

Next, the kernel VCEAPPLY is called (lines 5-12). To each variable in Y, 
a thread is assigned. It retrieves the precomputed data (lines 7-8) and either 
applies the RES rule (lines 9-10), substitution (lines 11-12), or nothing, in case 
T = NONE. However, a condition for applying a rule is that there is enough space, 
which is checked using the device function MEMORYSAFE (lines 13-18). The 
amount of allocated space for A is reflected by CAPACITY(A), and MEMORYSAFE 
checks if there is enough space in clauses, starting at cidz (lines 14-15). If there 
is, it is checked if the references can be stored in references (lines 16-17). 


4 Multiple Decision Making in Incremental Solving 


Given the fact that BMC SAT formulas often have many variables, a recently 
proposed extension of CDCL [30], in which periodically multiple decisions are 
made (MDM) at the same time, has much potential to speed up BMC. When the 
MDM method is called, it constructs a set M = {L € L | Var({0})n Var(X) = 0} 
such that there does not exist a clause C € S with |Var(C) \ Var(X)| = 1. In 
other words, the decisions M do not lead to logical follow-up assignments, i.e., 
implications. The reason for this restriction is that implications may lead to 
conflicts (clauses that cannot be satisfied). When a single decision is made, this 
decision needs to be rolled back when a conflict is caused, but when multiple 
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Algorithm 2: Decision making method DECIDE, with integrated MDM 


Input: X, L, decqueue, r, nConflicts, ConfFactor, preuM Dsize 
freevars — Var(L) \ Var(X) 
if r > 0 then 

M — MDM(freevars, decqueue) 

r= r— 1, prevMDsize — |M| 
else 

M + SINGLEDECISION(freevars, decqueue) 
if r= 0A |freevars| > preuMDsize then 

Tr — PERIODICFUSE(nConflicts, ConfFactor) 
return M 
function PERIODICFUSE(nConflicts, ConfFactor): 
11 if nConflicts > ConfFactor then 


am 
CcCOoOMN On AOUN HB 


12 UPDATEFACTOR( ConfFactor) 
13 return mdmrounds 

14 else 

15 return 0 


decisions are made, detecting which decisions actually cause a conflict is more 
difficult. Note that MDM cannot always make multiple decisions; implications 
are needed to solve a formula, so single decisions still have to be made frequently. 

In [30], MDM was integrated into MINISAT and GLUCOSE, and since multiple 
decisions should be selected periodically, a mechanism was proposed that decides 
when to make multiple decisions based on the solver restart policy. However, 
since solvers can differ greatly in this policy, we wanted to create an alternative 
mechanism not depending on this. PARAFROST is based on CADICAL [6], 
which has a very different restart policy compared to MINISAT and GLUCOSE. 

Algorithm 2 presents PARAFROST’s DECIDE method, which is called every 
time a decision must be made. Besides X and L, it is given a queue decqueue, in 
which the variables are ordered based on a decision heuristic. In PARAFROST, 
the heuristics Variable State Independent Decaying Sum (VSIDS) [27] and Vari- 
able Move-To-Front (VMTF) [7] are alternatingly used. The latter was not 
used before in [30]. DECIDE also gets a variable r, initially set to the constant 
mdmrounds. These values are used to control the periodic call of MDM, in which 
a set of multiple decisions is made per round. Experiments have shown that 
mdmrounds = 3 is effective [30]. Finally, the number of conflicts so far (nCon- 
flicts), a variable ConfFactor used to switch MDM on and off, and a variable 
prevMDsize, storing the size of the most recent set of multiple decisions, are 
given. 

To select new decisions, the set of unassigned variables is created at line 1. 
If we are calling MDM mdmrounds times (line 2), then MDM is called again 
and r is updated. The alternative is to make a single decision (line 6). If we 
have stopped calling MDM, and enough unassigned variables are present (line 
7), method PERIODICFUSE is called, which either sets r back to mdmrounds or to 
0, depending on nConflicts (lines 10-15). There are enough unassigned variables 
if there are more unassigned variables than variables in the most recent multiple 
decisions set. In PERIODICFUSE, nConflicts is compared to ConfFactor, which is 
initially set to a configurable value (default 2,000). ConfFactor is updated using 
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a function UPDATEFACTOR. This makes ConfFactor grow linearly, to achieve a 
suitable balance between ConfFactor and nConflicts as the solving progresses. 


5 Benchmarks 


We conducted experiments with CBMC in combination with MINISAT (the 
default), GLUCOSE, CADICAL, PARAFROST, PARAFROST with MDM, and 
a CPU-only version, referred to as PARAFROST (NOGPU).* We used the AWS 
benchmarks in which the data structures hash table, array list, array buff, 
linked list, priority queue, byte cursor and string were analysed. The 
loop unwinding upper-bounds 8, 16, 64, 128 and 1,000 were used, resulting in 
168 different verification problems. 

All experiments were executed on the DAS-5 cluster [4]. Each program was 
verified in isolation on a separate node, with a time-out of 3,600 s. Each node had 
an Intel Xeon E5-2630 CPU (2.4GHz) with 64 GB of memory, and an NVIDIA 
RTX 2080 Ti, with 68 SMs (64 cores/SM) and 11 GB global memory. 

Figure 4 presents the decision procedure runtime, and how much time was 
spent on VCE. PARAFROST outperforms all sequential solvers including CAD- 
ICAL (plot 4a). Even though PARAFROST is based on CADICAL, its different 
data structures, simplification mechanism and parameters tuned for large for- 
mulas makes PARAFROST more effective in these experiments. MDM further 
improves PARAFROST. Plot 4b demonstrates that CBMC with MINISAT often 
spends most of the time on VCE. PARAFROST significantly reduces the time 
spent on VCE compared to other solvers. 

In Table 1, the Verified column lists per solver the number of verified pro- 
grams, and PAR-2 gives the penalized average runtime-2 metric. PAR-2 score 
accumulates the running times of all solved instances with 2x the time-out of 
unsolved ones, divided by the total number of formulas. The solver with the 
lowest score is the winner. The triangles A and Y mean significantly better and 
worse, respectively. The MINISAT column lists how many programs were veri- 
fied faster with the other solvers compared to MINISAT. Between parentheses, 
it is given how many of those programs were not solved by MINISAT at all. The 
final four columns serve the same purpose for the other solvers. For example, 
PARAFROST-MDM verified 123 programs faster than CADICAL, in which 12 
could not be verified by the latter. The last two rows provide a similar compari- 
son. Clearly, PARRAFROST-MDM verified the largest number of programs, with 
the lowest score. 

Figure5 presents the speedups of the PARAFROST configurations 
for the individual cases. Overall, SAT solving was accelerated effectively 
with PARAFROST and PARAFROST-MDM. Compared to PARAFROST 
(NOGPU), PARAFROST (and PAaRAFROST-MDM), accelerated multiple 
instances by up to 18x (and 27x), and the geometric average speedup for all 
programs was 1.3x (and 1.6x). 


4 We also tried to use CBMC with Z3, but were not able to correctly configure this 
combination at the time of writing. 
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Fig. 4. CBMC runtimes for all solvers over the benchmark suite. 


Table 1. CBMC performance analysis using the various solvers. 


Configuration Verified PAR-2 MiniSat Glucose CaDiCaL PFCPU PFGPU 
CBMC + MiniSat 143 1219 n/a n/a n/a n/a n/a 
CBMC + Glucose 139 v 1388 v 49 (-4) n/a n/a n/a n/a 
CBMC + CaDiCaL 143 1226 43 53 (+4) n/a n/a n/a 
CBMC + PFCPU 154 824 51 (+11) 62 (+15) 83 (+11) n/a n/a 
CBMC + PFGPU 155 A765 466 (+12) A 83 (+16) A 96 (+12) 120 (+1) n/a 
CBMC + PFGPU-MDM 155 4743 A 84 (+12) a 102 (+16) A 123 (+12) 133 (+1) 121 
10° 
EEE GPU vs CPU 
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Fig. 5. Speedups of the individual cases. 


6 Conclusion 


We have presented PARAFROST, the first tool to accelerate BMC using GPUs. 
Given that BMC formulas tend to have much redundancy, PARAFROST effec- 
tively reduces solving times with GPU pre- and inprocessing, and by using 
MDM, which is particularly effective when many variables are present. In the 
future, we will combine our approach with (existing) multi-threaded BMC. We 
expect these techniques to strengthen each other. 
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Abstract. Symbolic model checking is an important tool for finding 
bugs (or proving the absence of bugs) in modern system designs. Because 
of this, improving the ease of use, scalability, and performance of model 
checking tools and algorithms continues to be an important research 
direction. In service of this goal, we present Pono, an open-source SMT- 
based model checker. Pono is designed to be both a research platform 
for developing and improving model checking algorithms, as well as a 
performance-competitive tool that can be used for academic and indus- 
try verification applications. In addition to performance, Pono prioritizes 
transparency (developed as an open-source project on GitHub), flexibil- 
ity (Pono can be adapted to a variety of tasks by exploiting its general 
SMT-based interface), and extensibility (it is easy to add new algorithms 
and new back-end solvers). In this paper, we describe the design of the 
tool with a focus on the flexible and extensible architecture, cover its cur- 
rent capabilities, and demonstrate that Pono is competitive with state- 
of-the-art tools. 


1 Introduction 


Model checking [39,61] is an influential verification capability in modern system 
design. Its greatest success has been with finite-state systems, where proposi- 
tional methods such as binary decision diagrams (BDDs) [28] and Boolean sat- 
isfiability (SAT) solvers [69] are used as verification engines. At the same time, 
significant efforts have been made to lift model checking techniques from finite- 
state to infinite-state systems [24,30,31,35,46,63]. This requires more expressive 
verification engines, such as solvers for satisfiability modulo theories (SMT) [19]. 
Proponents of SMT-based techniques argue that such techniques can also benefit 


“Pono” is the Hawaiian word for proper, correct, or goodness. Our goal is that Pono 
can be a useful tool for people to verify the correctness of systems. 
© The Author(s) 2021 
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finite-state systems, due to their ability to leverage word-level reasoning. Indeed, 
a word-level model checker won the most recent hardware model checking com- 
petition [22], giving credence to this claim. Despite these successes, there remain 
many directions for exploration in model checking. In this paper, we present 
Pono, an SMT-based model checking tool, with the goal of providing an open 
research platform for advancing these efforts. 

Pono is designed with three use cases in mind: 1) push-button verification; 
2) expert verification; and 3) model checker development. For 1, Pono provides 
competitive implementations of standard model checking algorithms. For 2, it 
exposes a flexible API, affording expert users fine-grained control over the tool. 
This can be useful in traditional model checking tasks (e.g., manually guiding 
the tool to an invariant, or adjusting the encoding for better performance), but 
it also enables the tool to be easily adapted for other tasks. In addition, Pono 
is designed using a completely generic SMT solver interface, making it trivial 
to experiment with different back-end solvers. For 3, Pono is open-source [7] 
and designed to be easily modifiable and extensible with a simple, modular, 
and hierarchical architecture. Taken together, these features make it relatively 
easy to do controlled experiments by comparing results obtained using Pono, 
while varying only the SMT solver or the model checking algorithm. Pono has 
already been used in a variety of research projects, both for model checking and 
other custom applications. It has also been used in two graduate level courses at 
Stanford University, where students used both the command-line interface and 
the API. With this promising start, we hope it will have a long and productive 
existence supporting research, education, and industry. 


2 Design 


Pono is designed around the manipulation and analysis of transition systems. 
A symbolic transition system is a tuple (X,J,T), where X is a set of (sorted) 
uninterpreted constants referred to as the current-state variables of the system 
and coupled with corresponding next-state variables X’; I(X) is a formula con- 
straining the initial states of the system; and T(X, X’) is a formula expressing 
the transition relation, which encodes the dynamics of the system. The transi- 
tion system representation provides a clean and general interface, allowing Pono 
to target both hardware and software model checking. Pono is designed to fully 
leverage the expressivity and reasoning power of modern SMT solving. Its for- 
mulas use the language and semantics of the SMT-LIB standard [17], and its 
model checking algorithms use an SMT solving oracle. To streamline the interac- 
tion with SMT solvers, Pono uses Smt-Switch [59], an open-source C++ API for 
SMT solving. Smt-Switch provides a convenient, efficient, and generic interface 
for SMT solving. Smt-Switch supports a variety of SMT solver back-ends and 
can switch between them easily. 

The diagram in Fig.1 displays the overall architecture of Pono. The blocks 
with a dashed outline are globally available and used throughout the codebase. 
The Pono API provides access to all of the components shown, supporting the 
design goal of giving expert users control and flexibility. 
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Fig. 1. Architecture diagram 


Core. The TransitionSystem class in Pono represents symbolic transition sys- 
tems as structured Smt-Switch terms. Key data structures include the following: 
i) inputvars: a vector of Smt-Switch symbolic constants representing primary 
inputs to the system (i.e., they are part of X, but their primed versions are not 
used and cannot appear in T); ii) statevars: a vector of Smt-Switch symbolic 
constants corresponding to the non-input state variables (the remaining variables 
in X); iii) next_map: a map from current (X) to next-state (X’) variables; iv) 
init: an Smt-Switch formula representing (X); and v) trans: an Smt-Switch 
formula representing T(X, X’). 

There are two kinds of transition systems: RelationalTransitionSystem 
and FunctionalTransitionSystem. The former has no restrictions on the form 
of the transition relation, while the latter is restricted to only functional updates: 
an equality (update assignment) with a next-state variable on the left and a 
function of current-state and input variables on the right. Some model check- 
ing algorithms take advantage of this structure [46,47]. Built-in checks ensure 
compliance with the restrictions. 

A Property is an Smt-Switch formula representing a property to check for 
invariance.! A ProverResult is an enum which can be one of the following: 
i) UNKNOWN (result could not be determined, including incompleteness due to 
checking only up to some bound); ii) FALSE (the property does not hold); iii) 
TRUE (the property holds); and iv) ERROR (there was an internal error). The 
Unroller is a class for producing unrolled transition systems, i.e., encoding a 
finite-length symbolic execution by introducing fresh variables for each timestep. 


1 Pono currently supports invariant checking. Support for temporal properties is left 
to future work. 
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Engines. Model checking algorithms are implemented as subclasses of the 
abstract class Prover and stored in the engines directory. We cover the current 
suite of engines in more detail in Sect. 3. 


Frontends. Although users can manually build transition systems through the 
API, it is also convenient to generate transition systems from structured input 
formats. Pono includes the following frontends: i) BTOR2Encoder: uses the open- 
source btor2tools [2] library to read the BTOR2 [66] format for hardware model 
checking; ii) SMVEncoder: supports a subset of nuXmv’s [30] SMT-based the- 
ory extension of SMV [61], which added support for infinite-state systems; iii) 
CoreIREncoder: encodes the CoreIR [11] circuit intermediate representation. 
Note that Verilog [10] can be supported by using a translator from Verilog 
to either BTOR2 or SMV. Examples of translators include Yosys [72] and Ver- 
ilog2SMV [53], both of which are open-source. 


Printers. Pono prints witness traces when a property does not hold. The sup- 
ported formats are the BTOR2 witness format and the VCD standard format used 
by EDA tools [10]. For theories such as arithmetic that are not supported by 
these formats, Pono implements simple extensions, ensuring that all variable 
assignments are included in witness traces. 


Modifiers and Refiners. Pono includes functions that perform various trans- 
formations on transition systems, including: adding an auxiliary variable [14]; 
building an implicit predicate abstraction [70]; and computing a static cone-of- 
influence reduction for a functional transition system under a given property. It 
also includes functions for refining an abstract transition system. 


Utils and Options. utils contains a collection of general-purpose classes and 
functions for manipulating and analyzing Smt-Switch terms and transition sys- 
tems. options contains a single class, PonoOptions, for managing command-line 
options. 


API. Pono’s native API is in C++. In addition, Pono has Python bindings that 
interact with the Smt-Switch Python bindings, both written in Cython [20]. 
These bindings behave very similarly to “pure” Python objects, allowing intro- 
spection and pythonic use of the API. 


We follow best practices for modern C++ development and code quality 
maintenance, including issue tracking, code reviews, and continuous integration 
(via GitHub Actions). The build infrastructure is written in CMake [3] and is 
configurable. The Pono repository also provides helper scripts for installing its 
dependencies. We support GoogleTest [5] for unit testing and gperftools [12] 
for code profiling. Tests can be parameterized by both the SMT solver and the 
algorithm or type of transition system. We utilize PyTest [9] to manage and 
parameterize unit tests for the python bindings. 


3 Capabilities 


In this section, we highlight some key capabilities of Pono. The design makes 
use of abstract interfaces and inheritance to make it easy to add or extend 
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functionality. Base class implementations of core functionality are provided but 
are kept simple to prioritize readability and transparency. And, of course, they 
can be overridden using inheritance and virtual functions. 

We start by describing the interface and engines provided for push-button 
verification. Next, we take a closer look at two ways that the basic architecture 
can be extended. We then show how to use Pono to reason about a transition 
system using algebraic datatypes, demonstrating the expressive power provided 
by the SMT back-end. 


Main Engines. All model checking algorithms in Pono are derived classes of 
the abstract base class Prover. The base class defines a simple public interface 
through a set of virtual functions: 


— initialize initializes any objects and data structures the prover needs. 

— check_until takes a non-negative integer parameter, k (the effort level), and 
calls the prover engine (the meaning of k is algorithm-dependent: in BMC [21] 
and k-induction [68], & is the unrolling length and in IC3-style [25] algorithms, 
it is the number of frames). The interface allows check_until to be called 
repeatedly with increasing values of k. An incremental algorithm can take 
advantage of this to reuse proof effort from previous calls. Engines that pro- 
duce full proofs can do so as long as they do it within the provided effort 
level. 

— prove attempts to prove a property without any limit on the bound. 

— witness is called after a failed call to prove or check_until. It provides 
variable assignments for each step in a counterexample trace. 

— invar is called after a successful full proof; it returns an inductive invariant 
that implies the property. The invariant is an Smt-Switch Term over current- 
state variables. Not all algorithms support this functionality. 


Pono has several engines, all of which have been lifted to the SMT-level. We 
now list the main engines and include the corresponding lines of code (LoC) in 
the primary source file (the LoC includes all comments and license headers): 
1. Bounded Model Checking [21] (88 LoC); 2. K-Induction [68] (161 LoC); 3. 
Interpolant-based Model Checking [62] (230 LoC); 4. IC3-style algorithms [25] 
(see below for LoC). The engines leverage the reusable infrastructure described 
in Sect. 2 (e.g., the Unroller for the unrolling based techniques). 


IC3 Variants. IC3 is widely recognized as one of the best-performing algorithms 
for SAT-based model checking [43]. Liftings to SMT are an area of active research 
and have produced several variations with promising results [23,24,34,35,47,51, 
54,55,71]. To support this active research direction, Pono includes a special IC3 
base class IC3Base, which implements a framework common to all variations of 
the algorithm.” The framework has several parameters that can be provided by 
specific instances of the algorithm: IC3Formula is a configurable data structure 
used to represent formulas constraining IC3 frames; inductive_generalization 
is the method used for inductive generalization; predecessor_generalization 


? For details on how the IC3 algorithm works, we refer the reader to [25,43]. 
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is the method used for predecessor generalization; and abstract and refine 
are methods that can be implemented for abstraction-refinement approaches to 
1C3 [35,47]. The implementation of IC3Base is 1086 lines of code. Current instan- 
tiations of IC3Base implemented in Pono include: i) IC3: a standard Boolean 
IC3 implementation [25,43] (152 LoC); ii) IC3Bits: a simple extension of IC3 to 
bit-vectors, which learns clauses over the individual bits (113 LoC); iii) Model- 
based IC3: a naive implementation of IC3 lifted to SMT, which learns clauses 
of equalities between variables and model values (397 LoC); iv) IC3IA: IC3 via 
Implicit Predicate Abstraction [35] (456 LoC); v) IC3SA: a basic implementation 
of IC3 with Syntax-Guided Abstraction for hardware verification [47] (984 LoC); 
vi) SyGuS-PDR: a syntax-guided synthesis approach for inductive generalization 
targeting hardware designs [73] (1047 LoC). 


Counterexample-Guided Abstraction Refinement (CEGAR). CEGAR 
[57] is a popular framework for iteratively solving difficult model checking prob- 
lems. It is typically parameterized by the underlying model checking algorithm, 
which operates on an abstract system that is iteratively refined as needed. Pono 
provides a generic CEGAR base class, parameterized by a model checking engine 
through a template argument. We describe two example uses of the CEGAR 
infrastructure implemented in Pono. 


Operator Abstraction. This simple CEGAR algorithm uses uninterpreted func- 
tions (UF) to abstract potentially expensive theory operators (e.g. multiplica- 
tion). The implementation is parameterized by the set of operators to replace 
with UFs. The refinement step analyzes a counterexample trace by restoring 
the concrete theory operator semantics. If the trace is found to be spurious, con- 
straints are added to enforce the real semantics for the abstracted operators (e.g., 
equalities between certain abstract UFs and their theory operator counterparts), 
thus ruling out the spurious counterexample. 


Counterexample-Guided Prophecy. This CEGAR approach replaces array vari- 
ables with initially memoryless variables of uninterpreted sort and replaces the 
select and store array operators with UFs [58]. Due to the array theory seman- 
tics, it is not always possible to remove spurious counterexamples with quantifier- 
free refinement axioms over existing variables. However, instead of using poten- 
tially expensive quantifiers, the algorithm adds auxiliary variables (history and 
prophecy variables) [14], which can rule out spurious counterexamples of a given 
finite length. This approach has the effect of removing the need for array solv- 
ing and can sometimes prove properties using prophecy variables that would 
otherwise require a universally quantified invariant. 


Case Study with Algebraic Datatypes. To illustrate the flexibility of Pono’s 
SMT-based formalism, we next describe a case study with generalized algebraic 
theories (GATs) [29]. GATs are a rich formalism which can be used for high-level 
specifications of software or mathematical constructs. While the equality of two 
terms in a GAT is undecidable, one can ask the bounded question: “Does there 
exist a path of up to n rewrites to take a source term to a target term?” 
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To model this question, we use algebraic datatypes to represent depend- 
ently-typed abstract syntax trees (ASTs), paths through an AST (e.g., the 2nd 
argument of the 3rd argument of a term’s 1st argument), and rewrite rules (e.g., 
succ(n+1) = succ(m+1) = succ(n) = succ(m)). Smt-Switch supports algebraic 
datatypes through the CVC4 [18] back-end. A rewrite function is encoded as a 
transition relation. The decision of which rule to apply and at which subpath 
to apply it is controlled by input variables, and a state variable represents the 
current AST term (initially set to the source term). We check the property 
that the target term is not reachable from the source term. Consequently, any 
discovered counterexample is a valid rewrite sequence, serving as a proof of an 
equality that holds in the theory. 

The workflow accepts a GAT input, produces an SMT encoding optimized 
for that particular theory, and then parses user-provided source and target terms 
into this theory before running bounded model checking. We used Pono to suc- 
cessfully find equalities in the theories of Boolean algebras, preorders, monoids, 
categories, and read-over-write arrays. This case study demonstrates Pono’s abil- 
ity to model and model check unconventional systems. 


4 Related Work 


Existing academic model checkers span a wide range of supported theories, 
modeling capabilities, and implemented algorithms. An important early model 
checker was SMV [61], which pioneered symbolic model checking of temporal 
logic properties [67] through BDDs [28]. NuSMV [32] and NuSMV2 [33] refined and 
extended the tool, followed by nuXmv [30] — a closed-source tool which added sup- 
port for various SMT-based verification techniques using the SMT solver Math- 
SAT5 [36]. Spin [52] is a well-known explicit-state model checker with extensive 
support for partial order reduction and other optimizations. 

Several model checkers specifically target hardware verification. ABC [26] is a 
well-established, state-of-the-art bit-level hardware model checker based on SAT 
solving. CoSA [60] is an open-source model checker implemented in Python using 
the Python solver-agnostic SMT solving library, PySMT [45]. Although CoSA also 
relies on a generic API similar to Smt-Switch, the Python implementation intro- 
duces significant overhead, limiting its ability to include efficient procedures that 
must be implemented outside of the underlying SMT solver (e.g., CEGAR loops 
and some IC3 variants). AVR [48] is a state-of-the-art SMT-based hardware model 
checker supporting several standard model checking algorithms. It also imple- 
ments a novel technique: IC3 via syntax-guided abstraction [47]. Importantly, 
AVR won the hardware model checking competition in 2020 [22], outperforming 
the previous state-of-the-art SAT-based model checker, ABC. AVR is currently 
closed-source, making it unsuitable for several of the use-cases targeted by our 
work, but a binary is available on GitHub [1]. 

There are several SMT-based model checkers focused on parameterized pro- 
tocols. MCMT [46], the open-source extension Cubicle [49], and related sys- 
tems [15,16] perform backward-reachability analysis over infinite-state arrays. 
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Other open-source SMT-based model checkers include: i) ic3ia [13] — an 
example implementation of IC3IA built on MathSAT [36]; ii) Kind2 [31] — a 
model checker for Lustre programs; iii) Sally [42] — a model checker for infinite- 
state systems that uses the SAL language [65] and MCMT, an extension of 
the SMT-LIB text format for declaring transition systems; iv) Spacer [56] — a 
Constrained Horn Clauses (CHC) solver built into the open-source Z3 [64] SMT 
solver, also based on an IC3-style algorithm; and v) Intrepid [27] — a model 
checker focusing primarily on the control engineering domain. 

Pono is open-source, SMT-based, and implements a variety of model checking 
algorithms over transition systems. Furthermore, in contrast to the tools which 
focus on more limited domains, it has support for a wide set of SMT theories 
including fixed-width bit-vectors, arithmetic, arrays, and algebraic datatypes. 
To our knowledge all current open-source SMT-based model checkers tie the 
implementation directly to an existing SMT solver or use PySMT or the SMT- 
LIB text format to interact with arbitrary solvers. In contrast, Pono makes use 
of the C++ API of Smt-Switch to efficiently manipulate SMT terms and solvers 
in memory without a need for a textual interface. This allows Pono to provide 
both flexibility and performance. Finally, like the new model checker Intrepid, 
Pono provides an extensive API, which can be adapted and extended as needed. 
However, the focus is broader than Intrepid in terms of application domains. 


5 Evaluation 


In this section, we evaluate Pono® against current state-of-the-art model checkers 
across several domains. Our evaluation is not intended to be exhaustive. Rather, 
we highlight the breadth of Pono by selecting four sets of benchmarks in three 
diverse categories and a few reasonable competitors for each. The benchmarks are 
drawn from the following theories: i) unbounded quantifier-free arrays indexed by 
integers; ii) quantifier-free linear arithmetic over reals and integers; and iii) hard- 
ware verification over quantifier-free bit-vectors and (finite, bit-vector indexed) 
arrays. We ran all experiments on a 3.5 GHz Intel Xeon E5-2637 v4 CPU with 
a timeout of 1h and a memory limit of 16Gb. For all results, we also include 
the average runtime of solved instances in seconds. For portfolio solving, we ran 
each configuration in its own process with the full time and memory resources. 
In the first two categories, Pono used MathSAT5 [36] as the underlying SMT 
solver and interpolant [37,40,62] producer. For the hardware benchmarks, it 
used MathSAT5, Boolector [66], or both, depending on the configuration. 


Arrays. We evaluate Pono on the integer-indexed array benchmark set of [44]. 
These are Constrained Horn Clauses (CHC) benchmarks inspired by software 
verification problems. Although there are no quantifiers in the benchmarks them- 
selves, most cannot be proved safe without strengthening the property with 
quantified invariants. We compare against: i) freqhorn [44], a state-of-the-art 
CHC solver for this type of problem; ii) prophic3 [8], a recent method that 


3 GitHub commit c175a302857f00229a0919d5cc8fc3f78d04a26. 
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Pono |prophic3/prophic3-SA|freghorn| nuXmv 
solved||71 (16s)| 71 (20s) 66 (31s) 69 (6s) |4 (51s) 


Fig. 2. Results on Freqhorn Array benchmarks (81 total), all expected to be safe. 


result ||SystemC (43 total) Lustre (951 total) 
Pono nuXmv Pono nuXmv | kind2 
safe 18 (673s)} 21 (571s) |521 (10s)|516 (8s)| 506 (2s) 
unsafe ||14 (325s)| 15 (479s) |} 412 (5s) |412 (1s)|409 (0.2s) 
total (|32 (521s)| 36 (533s) || 933 (8s) |928 (5s)} 915 (1s) 


Fig. 3. Results on arithmetic benchmarks. 


outperforms freghorn [58]; and iii) nuXmv, which does not support quanti- 
fied invariants, to illustrate that most of these benchmarks do require them; 
freghorn takes the CHC format natively, and we used scripts from the ic3ia 
and nuXmv distributions to translate the CHC input to SMV and the Verification 
Modulo Theories (VMT) format [38] — an annotated SMT-LIB file representing 
a transition system — for the other tools. We ran Pono with Counterexample- 
Guided Prophecy using IC3IA as the underlying model checking technique. We 
ran prophic3 with both of the option sets used in their paper, and we ran the 
default configuration of freqhorn. Our results are shown in Fig. 2. We observe 
that Pono solves the same number of benchmarks as the reference implementa- 
tion prophic3 and is a bit faster. 


Arithmetic. We next evaluate Pono on two sets of arithmetic benchmarks, 
both from the nuXmv distribution’s example directory. The first uses linear real 
arithmetic, and the second uses linear integer arithmetic. Figure3 displays the 
results on both benchmark sets. 


Linear Real Arithmetic. We chose the systemc QF_LRA example benchmarks, 
because this is the largest set of linear real arithmetic benchmarks in the subset 
of SMV supported by Pono.* We ran both nuXmv and Pono with BMC and 
IC3IA in a portfolio. For both model checkers, BMC did not contribute any 
unique solves. We observe that Pono is quite competitive with nuXmv on nuXmv’s 
own benchmarks. 


Linear Integer Arithmetic. We also evaluate Pono on a set of Lustre benchmarks 
which use quantifier-free linear integer arithmetic. We obtained the Lustre bench- 
marks from the Kind [50] website [6] and the SMV translation of the benchmarks 
from the distribution of nuXmv. We compare against both nuXmv and Kind2 [31], 
the latest version of Kind. We ran all tools with a portfolio of techniques. For 
Pono and nuXmv we ran BMC and IC3IA. For Kind2 we ran two configurations 
suggested by the authors: the default configuration with Z3 [64] and the default 
configuration, but with Yices2 [41] as the main SMT solver. Since the default 


* Pono does not yet support enumeration types. 
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result BV (324 total) BV + Array (315 total) 
Pono AVR CoSA2 |sygus-apdr Pono AVR CoSA2 
safe 183 (283s)/215 (115s)| 98 (283s) | 115 (545s) ||252 (224s)|274 (63s)|209 (299s) 
unsafe || 47 (314s) | 47 (220s) | 41 (232s) | 15 (279s) 19 (208s) | 19 (352s) | 19 (204s) 
total ||230 (289s) |262 (134s)|139 (268s)| 130 (514s) ||271 (223s)|293 (82s)|228 (291s) 


Fig. 4. Results on HWMCC2020 benchmarks. 


configurations of Kind2 run 8 techniques in parallel, we gave each configura- 
tion 8 cores. Additionally, we ran Kind2’s BMC and IC3 implementations using 
MathSAT5 as the SMT solver, because this is closest to the other model check- 
ers’ configurations. The default with Z3 was the best configuration of Kind2. 
We observe that Pono solves the most benchmarks overall. Once again, BMC 
contributed no unique solves for any model checker. 


Hardware Verification. Finally, we evaluate Pono on the 2020 Hardware 
Model Checking Competition (HWMCC) benchmarks. The benchmarks are split 
into bitvector-only and bitvector plus array categories. We evaluate against 
AVR [1,48] and CoSA2 [4] (a previous name and version of Pono), the winners 
of HWMCC 2020 and HWMCC 2019, respectively. We also compare against 
sygus-apdr (the reference implementation of SyGuS-PDR [73]) on the bitvec- 
tor benchmarks (as sygus-apdr targets bitvectors). We ran all 16 configura- 
tions of AVR from their HWMCC 2020 entry: several configurations of BMC and 
k-induction, and 11 configurations of IC3SA. We ran the 4 configurations of 
CoSA2 from the HWMCC 2019 entry: two BMC configurations, k-induction, and 
interpolant-based model checking. We ran sygus-apdr with 4 different param- 
eters controlling the grammar for lemmas. For the bitvector-only benchmarks, 
we ran Pono with 10 configurations: 3 configurations of IC3IA, 2 configurations 
of IC3SA, 2 configurations of SyGuS-PDR, IC3Bits, k-induction, and BMC. For 
the array benchmarks, we ran 5 configurations: 3 configurations of IC3IA (one 
with Counterexample-Guided Prophecy), k-induction, and BMC. We show our 
results on the HWMCC 2020 benchmarks in Fig. 4. AVR wins in both categories, 
although Pono is fairly competitive, outperforming the other tools. 

These results show that Pono is well on its way to being both widely applica- 
ble and performance-competitive. The arithmetic experiments demonstrate the 
capabilities of its IC3IA engine, but other engines have some room for improve- 
ment. In particular, both IC3SA and SyGuS-PDR were recently added to Pono, 
and its implementation of these algorithms still lags the corresponding imple- 
mentations in AVR and sygus-apdr, respectively. There are also some features 
that are known to help performance and are not yet implemented in Pono. For 
example, the best configurations of AVR use UF data abstraction. This differs 
from our UF operator abstraction in that it replaces all abstracted data with 
uninterpreted sorts and learns targeted data refinement axioms. 
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6 Conclusion 


We have presented Pono: a new open-source, SMT-based, and solver-agnostic 
model checker. We described its capabilities, design, and the emphasis on flexi- 
bility and extensibility in addition to performance. We demonstrated empirically 
that the suite of model checking algorithms is competitive with state-of-the-art 
tools. Pono has already been used in several research projects and two graduate- 
level classes. With this promising start, we believe that Pono is poised to have 
an enduring and beneficial impact on research, education, and model checking 
applications. Future work includes adding support for temporal properties [67] 
and improving and adding to Pono’s engines, in particular the [C3 variants. 


Acknowledgements. This work was partially supported by the National Science 
Foundation Graduate Research Fellowship Program under Grant No. DGE-1656518. 
Any opinions, findings, and conclusions or recommendations expressed in this material 
are those of the author(s) and do not necessarily reflect the views of the National Science 
Foundation. This work was also supported by the Defense Advanced Research Projects 
Agency, grants FA8650-18-1-7818 and FA8650-18-2-7854. We thank these sponsors and 
our industry collaborators for their support. 


References 


AVR distribution. https://github.com/aman-goel/avr 

btor2tools. https: //github.com/Boolector/btor2tools 

CMake. https://cmake.org 

cosa2. https: //github.com/upscale- project /cosa2 

GoogleTest. https://github.com/google/googletest 

Kind site. http://clc.cs.uiowa.edu/Kind/index.php?page=experimental-results 

Pono. https: //github.com/upscale-project/pono 

ProphIC3 (commit: 497e2fbfb813bcf0a2c3bcb5b55ad47b2a678611). https: //github. 

com/makaimann/prophic3 

9. pytest 5.4.2. https://github.com/pytest-dev/pytest 

10. IEEE Std 1364-2005, pp. 1-590 (2006) 

11. CoreIR (2017). https: //github.com/rdaly525 /coreir 

12. Google Perftools (2017). https://github.com/gperftools/gperftools 

13. ic3ia. https: //es-static.fbk.eu/people/griggio/ic3ia/index.html. Accessed 2020 

14. Abadi, M., Lamport, L.: The existence of refinement mappings. In: Proceedings of 
LICS, pp. 165-175, July 1988 

15. Alberti, F., Bruttomesso, R., et al.: SAFARI: SMT-based abstraction for arrays 
with interpolants. In: Proceedings of CAV, pp. 679-685 (2012) 

16. Alberti, F., Ghilardi, S., Sharygina, N.: Booster: an acceleration-based verification 
framework for array programs. In: Proceedings of ATVA, pp. 18-23 (2014) 

17. Barrett, C., Fontaine, P., Tinelli, C.: The Satisfiability Modulo Theories Library 
(SMT-LIB) (2016). www.smt-lib.org 

18. Barrett, C.W., et al.: CVC4. In: Proceedings of CAV, pp. 171-177 (2011) 


DOT SO Cor RS 


472 


19. 


20. 


21, 


22. 


23. 


24. 


25. 


26. 


2T. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 
36. 
37. 
38. 
39. 
40. 


41. 
42. 


43. 


44. 


M. Mann et al. 


Barrett, C.W., Sebastiani, R., Seshia, S.A., Tinelli, C.: Satisfiability modulo theo- 
ries. In: Handbook of Satisfiability, pp. 825-885 (2009) 

Behnel, S., Bradshaw, R., Citro, C., Dalcin, L., Seljebotn, D.S., Smith, K.: Cython: 
the best of both worlds. Comput. Sci. Eng. 2, 31-39 (2011) 

Biere, A., Cimatti, A., Clarke, E.M., Zhu, Y.: Symbolic model checking without 
BDDs. In: Proceedings of TACAS, pp. 193-207 (1999) 

Biere, A., Froleyks, N., Preiner, M.: Hardware model checking competition (2020). 
http://fmv.jku.at/hwmcc20/ 

Birgmeier, J., Bradley, A., Weissenbacher, G.: Counterexample to induction-guided 
abstraction-refinement (CTIGAR). In: Proceedings of CAV, pp. 831-848 (2014) 
Bjørner, N., Gurfinkel, A.: Property directed polyhedral abstraction. In: Proceed- 
ings of VMCAI, pp. 263-281 (2015) 

Bradley, A.: SAT-based model checking without unrolling. In: Proceedings of 
VMCAI, pp. 70-87 (2011) 

Brayton, R., Mishchenko, A.: ABC: An academic industrial-strength verification 
tool. In: Proceedings of CAV, pp. 24-40 (2010) 

Bruttomesso, R.: Intrepid: An SMT-based model checker for control engineering 
and industrial automation. In: SMT Workshop, August 2019 

Bryant, R.E.: Graph-based algorithms for Boolean function manipulation. IEEE 
Trans. Comput. 8, 677-691 (1986) 

Cartmell, J.: Generalised algebraic theories and contextual categories. Ann. Pure 
Appl. Logic 209-243 (1986) 

Cavada, R., Cimatti, A., et al.: The nuXmv symbolic model checker. In: Proceed- 
ings of CAV, pp. 334-342 (2014) 

Champion, A., Mebsout, A., Sticksel, C., Tinelli, C.: The Kind 2 model checker. 
In: Proceedings of CAV, pp. 510-517 (2016) 

Cimatti, A., Clarke, E.M., Giunchiglia, F., Roveri, M.: NUSMV: A new symbolic 
model verifier. In: Proceedings of CAV, pp. 495-499 (1999) 

Cimatti, A., Clarke, E.M., et al.: NuSMV 2: an opensource tool for symbolic model 
checking. In: Proceedings of CAV, pp. 359-364 (2002) 

Cimatti, A., Griggio, A., Irfan, A., et al.: Incremental linearization for satisfiability 
and verification modulo nonlinear arithmetic and transcendental functions. ACM 
Trans. Comput. Log. 19:1-19:52 (2018) 

Cimatti, A., Griggio, A., Mover, S., Tonetta, S.: Infinite-state invariant checking 
with IC3 and predicate abstraction. FMSD 3, 190-218 (2016) 

Cimatti, A., Griggio, A., Schaafsma, B., Sebastiani, R.: The MathSAT5 SMT 
Solver. In: Piterman, N., Smolka, S. (eds.) Proceedings of TACAS (2013) 
Cimatti, A., Griggio, A., Sebastiani, R.: Efficient generation of Craig interpolants 
in satisfiability modulo theories. ACM Trans. Comput. Log. (1), 7:1-7:54 (2010) 
Cimatti, A., et al.: Verification Modulo Theories (2011). http://www.vmt-lib.org 
Clarke, E., Henzinger, T., et al.: Handbook of Model Checking (2018) 

Craig, W.: Linear reasoning. A new form of the Herbrand-Gentzen theorem. J. 
Symb. Log. (3), 250-268 (1957) 

Dutertre, B.: Yices 2.2. In: Proceedings of CAV, pp. 737-744 (2014) 

Dutertre, B., Jovanovic, D., Navas, J.A.: Verification of fault-tolerant protocols 
with Sally. In: Proceedings of NFM, pp. 113-120 (2018) 

Eén, N., Mishchenko, A., Brayton, R.K.: Efficient implementation of property 
directed reachability. In: Proceedings of FMCAD, pp. 125-134 (2011) 
Fedyukovich, G., Prabhu, S., Madhukar, K., Gupta, A.: Quantified invariants via 
syntax-guided synthesis. In: Proceedings of CAV, pp. 259-277 (2019) 


Pono Model Checker 473 


. Gario, M., Micheli, A.: PySMT: A solver-agnostic library for fast prototyping of 


SMT-based algorithms. In: Proceedings of SMT Workshop, pp. 373-384 (2015) 


. Ghilardi, S., Ranise, S.: MCMT: a model checker modulo theories. In: Automated 


Reasoning, pp. 22-29 (2010) 
Goel, A., Sakallah, K.A.: Model checking of Verilog RTL using IC3 with syntax- 
guided abstraction. In: Proceedings of NFM, pp. 166-185 (2019) 


. Goel, A., Sakallah, K.A.: AVR: abstractly verifying reachability. In: Proceedings 


of TACAS, pp. 413-422 (2020) 


. Goel, A., Krstic, S., Leslie, R., Tuttle, M.R.: SMT-based system verification with 


DVF. In: Proceedings of SMT Workshop, pp. 32-43 (2012) 


. Hagen, G., Tinelli, C.: Scaling up the formal verification of Lustre programs with 


SMT-based techniques. In: Proceedings of FMCAD, pp. 1-9 (2008) 


. Ho, Y., Mishchenko, A., Brayton, R.K.: Property directed reachability with word- 


level abstraction. In: Proceedings of FMCAD, pp. 132-139 (2017) 


. Holzmann, G.J.: The SPIN Model Checker - primer and reference manual (2004) 
. Irfan, A., Cimatti, A., Griggio, A., Roveri, M., Sebastiani, R.: Verilog2SMV: a tool 


for word-level verification. In: Proceedings of DATE, pp. 1156-1159 (2016) 
Jovanovic, D., Dutertre, B.: Property-directed k-induction. In: Proceedings of 
FMCAD, pp. 85-92 (2016) 


. K., H.G.V., Fedyukovich, G., Gurfinkel, A.: Word level property directed reacha- 


bility. In: Proceedings of ICCAD, pp. 107:1—-107:9 (2020) 


. Komuravelli, A., Gurfinkel, A., et al: Automatic abstraction in SMT-based 


unbounded software model checking. In: Proceedings of CAV, pp. 846-862 (2013) 
Kroening, D., Groce, A., Clarke, E.M.: Counterexample guided abstraction refine- 
ment via program execution. In: Proceedings of ICFEM, pp. 224-238 (2004) 


. Mann, M., Irfan, A., et al.: Counterexample-guided prophecy for model checking 


modulo the theory of arrays. In: Proceedings of TACAS, pp. 113-132 (2021) 


. Mann, M., Wilson, A., et al.: SMT-Switch: a Solver-agnostic C++ API for SMT 


Solving. In: Proceedings of SAT (2021) 


. Mattarei, C., Mann, M., Barrett, C., et al.: CoSA: Integrated verification for agile 


hardware design. In: Proceedings of FMCAD, pp. 1-5 (2018) 


. McMillan, K.: Symbolic model checking - an approach to the state explosion prob- 


lem. Ph.D. thesis, Carnegie Mellon University (1992) 


. McMillan, K.L.: Interpolants and symbolic model checking. In: Proceedings of 


VMCAI, pp. 89-90 (2007) 


. McMillan, K.L., Padon, O.: Ivy: a multi-modal verification tool for distributed 


algorithms. In: Proceedings of CAV, pp. 190-202 (2020) 


. de Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: Proceedings of TACAS, 


pp. 337-340 (2008) 


. de Moura, L., et al.: Sal 2. In: Proceedings of CAV, pp. 496-500 (2004) 
. Niemetz, A., Preiner, M., Wolf, C., Biere, A.: Btor2, BtorMC and Boolector 3.0. 


In: Proceedings of CAV, pp. 587-595 (2018) 
Pnueli, A.: The temporal logic of programs. In: Proceedings of FOCS, pp. 46-57 
(1977) 


. Sheeran, M., Singh, S., Stalmarck, G.: Checking safety properties using induction 


and a SAT-solver. In: Proceedings of FMCAD, pp. 108-125 (2000) 


. Silva, J-P.M., Lynce, I., Malik, S.: Conflict-driven clause learning SAT solvers. In: 


Handbook of Satisfiability, pp. 131-153 (2009) 


. Tonetta, S.: Abstract model checking without computing the abstraction. In: Pro- 


ceedings of FM, pp. 89-105 (2009) 


474 M. Mann et al. 


71. Welp, T., Kuehlmann, A.: QF BV model checking with property directed reacha- 
bility. In: Proceedings of DATE, pp. 791-796 (2013) 

72. Wolf, C., Glaser, J., Kepler, J.: Yosys-a free Verilog synthesis suite. In: Proceedings 
of Austrochip Workshop (2013) 

73. Zhang, H., Gupta, A., Malik, S.: Syntax-guided synthesis for lemma generation in 
hardware model checking. In: Proceedings of VMCAI (2021) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


Logical Foundations 


S 


Check for 
updates 


Towards a Trustworthy Semantics-Based 
Language Framework via Proof Generation 


Xiaohong Chen!)@, Zhengyao Lin!®, Minh-Thai Trinh? ®, 
and Grigore Rosu'® 


1 University of Illinois at Urbana-Champaign, 
Champaign, USA 
{xc3,z138,grosu}@illinois.edu 
2 Advanced Digital Sciences Center, 
Illinois at Singapore, Singapore, Singapore 
trinhmt@illinois.edu 


Abstract. We pursue the vision of an ideal language framework, where 
programming language designers only need to define the formal syntax 
and semantics of their languages, and all language tools are automati- 
cally generated by the framework. Due to the complexity of such a lan- 
guage framework, it is a big challenge to ensure its trustworthiness and 
to establish the correctness of the autogenerated language tools. In this 
paper, we propose an innovative approach based on proof generation. 
The key idea is to generate proof objects as correctness certificates for 
each individual task that the language tools conduct, on a case-by-case 
basis, and use a trustworthy proof checker to check the proof objects. 
This way, we avoid formally verifying the entire framework, which is 
practically impossible, and thus can make the language framework both 
practical and trustworthy. As a first step, we formalize program execu- 
tion as mathematical proofs and generate their complete proof objects. 
The experimental result shows that the performance of our proof object 
generation and proof checking is very promising. 


Keywords: Semantic framework - Proof generation - Proof checking 


1 Introduction 


Unlike natural languages that allow vagueness and ambiguity, programming lan- 
guages must be precise and unambiguous. Only with rigorous definitions of pro- 
gramming languages, called the formal semantics, can we guarantee the reliabil- 
ity, safety, and security of computing systems. 

Our vision is thus an ideal language framework based on the formal semantics 
of programming languages. Shown in Fig. 1, an ideal language framework is one 
where language designers only need to define the formal syntax and semantics 
of their language, and all language tools are automatically generated by the 
framework. The correctness of these language tools is established by generating 
complete mathematical proofs as certificates that can be automatically machine- 
checked by a trustworthy proof checker. 
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Fig. 1. An ideal language framework vision; language tools are autogenerated, with 
machine-checkable mathematical proofs as correctness certificates. 


The K language framework (https://kframework.org) is in pursuit of the 
above ideal vision. It provides a simple and intuitive front end language (i.e., a 
meta-language) for language designers to define the formal syntax and semantics 
of other programming languages. From such a formal language definition, the 
framework automatically generates a set of language tools, including a parser, 
an interpreter, a deductive verifier, a program equivalence checker, among many 
others [9,24]. K has obtained much success in practice, and has been used to 
define the complete executable formal semantics of many real-world languages, 
such as C [12], Java [2], JavaScript [21], Python [13], Ethereum virtual machines 
byte code [15], and x86-64 [10], from which their implementations and formal 
analysis tools are automatically generated. Some commercial products [14, 18] 
are powered by these autogenerated implementations and/or tools. 

What is missing in K (compared to the ideal vision in Fig. 1) is its ability 
to generate proof objects as correctness certificates. The current K implemen- 
tation is a complex artifact with over 500,000 lines of code written in 4 pro- 
gramming languages, with new code committed on a weekly basis. Its code base 
includes complex data structures, algorithms, optimizations, and heuristics to 
support the various features such as defining formal language syntax using BNF 
grammar, defining computation configurations as constructor terms, defining 
formal semantics using rewrite rules, specifying arbitrary evaluation strategies, 
and defining the binding behaviors of binders (Sect.3). The large code base and 
rich features make it challenging to formally verify the correctness of K. 

Our main contribution is the proposal of a practical approach to establish- 
ing the correctness of a complex language framework, such as K, via proof object 
generation. Our approach consists of the following main components: 


1. A small logical foundation of K; 
2. Proof parameters that are provided by K as the hints for proof generation; 
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3. A proof object generator that generates proof objects from proof parameters; 
4. A fast and trustworthy third-party proof checker that verifies proof objects. 


The key idea that makes our approach practical is that we establish the correct- 
ness not for the entire framework, but for each individual language tasks that it 
conducts, on a case-by-case basis. This idea is not limited to K but also appli- 
cable to the existing language frameworks and/or formal semantics approaches. 

As a first step, we formalize program execution as mathematical proofs and 
generate their complete proof objects. The experimental result (Table 1) shows 
promising performance of the proof object generation and proof checking. For 
example, for a 100-step program execution trace, its complete proof object has 
1.6 million lines of code that takes only 5.6s to proof-check. 

We organize the rest of the paper as follows. We give an overview of our app- 
roach in Sect. 2. We introduce K and discuss the generation of proof parameters 
in Sect. 3. We discuss matching logic—the logical foundation of K—in Sect. 4. 
We then compile K to matching logic in Sect. 5, and discuss proof object gener- 
ation in Sect. 6. We discuss the limitations of our current implementation and 
show the experiment results in Sects. 7 and 8, respectively. Finally, we discuss 
related work in Sect. 9 and conclude the paper in Sect. 10. 


2 Our Approach Overview 


We give an overview of our approach via the following four main components: 
(1) a logical foundation of K, (2) proof parameters, (3) proof object generation, 
and (4) a trustworthy proof checker. 


Logical Foundation of K. Our approach is based on matching logic [5,22]. 
Matching logic is the logical foundation of K, in the following sense: 


1. The K definition (i.e., the language definition in Fig.1) of a programming 
language L corresponds to a matching logic theory '”, which, roughly speak- 
ing, consists of a set of logical symbols that represents the formal syntax of 
L, and a set of logical axioms that specify the formal semantics. 

2. All language tools in Fig.1 and all language tasks that K conducts are for- 
mally specified by matching logic formulas. For example, program execution 
is specified (in our approach) by the following matching logic formula: 


Pinit > Pfinal (1) 


where Yinit is the formula that specifies the initial state of the execution, Yfinai 
specifies the final state, and “=” states the rewriting/reachability relation 
between states (see Sect. 5.1). 

3. There exists a matching logic proof system that defines the provability relation 
F between theories and formulas. For example, the correctness of the above 
execution from Yinit tO Yfinal iS witnessed by the formal proof: 


r% H Qinit > Y final (2) 
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Therefore, matching logic is the logical foundation of K. The correctness of K 
conducting one language task is reduced to the existence of a formal proof in 
matching logic. Such formal proofs are encoded as proof objects, discussed below. 


Proof Parameters. A proof parameter is the necessary information that K 
should provide to help generate proof objects. For program execution, such as 
Eq. (2), the proof parameter includes the following information: 


— the complete execution trace Yo, ¥1,---;Yn, Where Yo = Yinit and Yn = 
Pfinal; we call Yo,..., Pn the intermediate snapshots of the execution; 

— for each step from y; to Y;41, the rewriting information that consists of the 
rewrite/semantic rule ying = Yrns that is applied, and the corresponding 
substitution 0 such that Yihs0 = pi. 


In other words, a proof parameter of a program execution trace contains the 
complete information about how such an execution is carried out by K. The 
proof parameter, once generated by K, is passed to the proof object generator 
to generate the corresponding proof object, discussed below. 


Proof Object Generation. In our approach, a proof object is an encoding 
of matching logic formal proofs, such as Eq. (2). Proof objects are generated by 
a proof object generator from the proof parameters provided by K. At a high 
level, a proof object for program execution, such as Eq. (2), consists of: 


1. the formalization of matching logic and its provability relation F; 

2. the formalization of the formal semantics I’ as a logical theory, which 
includes axioms that specify the rewrite/semantic rules Ying > Yrns} 

the formal proofs of all one-step executions, i.e., T4 + y; > Qi+1 for all i; 

4. the formal proof of the final proof goal TE + Yini > final: 


eo 


Our proof objects have a linear structure, which implies a nice separation of 
concerns. Indeed, Item 1 is only about matching logic and is not specific to any 
programming languages/language tasks, so we only need to develop and proof- 
check it once and for all. Item 2 is specific to the language semantics I’ but is 
independent of the actual program executions, so it can be reused in the proof 
objects of various language executions for the same programming language L. 


A Trustworthy Proof Checker. A proof checker is a small program that 
checks whether the formal proofs encoded in a proof object are correct. The proof 
checker is the main trust base of our work. In this paper, we use Metamath [20|— 
a third-party proof checking tool that is simple, fast, and trustworthy—to for- 
malize matching logic and encode its formal proofs. 
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1 module IMP-SYNTAX 20 module IMP imports IMP-SYNTAX 
2 imports DOMAINS -SYNTAX 21 imports DOMAINS 
3 syntax Exp ::= 22 syntax KResult ::= Int 
4 Int 23 configuration 
5 Id 24 <T> <k> $PGM:Pgm </k> 
6 Exp "+" Exp stri 25 <state> .Map </state> </T> 
7 Exp "-" Exp eft, t 26 rule <k> X:Id => I ...</k> 
8 is Gadi Enp 1J" bracke 27 <state>... X |-> I ...</state> 
9 syntax Stmt ::= 28 rule Il + I2 => I1 +Int I2 
10 Id "=" Exp ";" [strict(2)] 29 rule Il - I2 => I1 -Int I2 
11 Sit BE E ys 30 rule <k> X = I:Int => I ...</k> 
12 Stmt Stmt trict(1)] 31 <state>... X |-> (_ => I) ...</state> 
13 "while" "(" Exp ")" Stmt 32 rule {} S:Stmt => S 
14 EN Seme 2p" bracket 33 rule if(I) S _ => S requires I =/=Int 0 
15 wg eye 34 rule if(0) _S => S 
16 > Stmt Stmt eft, stric ] 35 rule while(B) S => if(B) {S while(B) S} {} 
17 syntax Pgm ::= "int" Ids ";" Stmt 36 rule <k> int (X, Xs => Xs) ; S </k> 
18 syntax Ids ::= List{Id,","} 37 <state>... (. => X |-> 0) </state> 
19 endmodule 38 rule int .Ids ; S => S 
39 endmodule 


Fig. 2. The complete K formal definition of an imperative language IMP. 


Summary. Our approach to establishing the correctness of K is based on its 
logical foundation—matching logic. We formalize language semantics as logi- 
cal theories, and program executions as formulas and proof goals, whose proof 
objects are automatically generated and proof-checked. Our proof objects have 
a linear structure that allows easy reuse of their components. The key charac- 
teristics of our logical-based approach are the following: 


— Itis faithful to the real K implementation because proof objects are generated 
from proof parameters, which include all execution snapshots and the actual 
rewriting information, provided by K. 

— It is practical because proof objects are generated for each program executions 
on a case-by-case bases, avoiding the verification of the entire K. 

— It is trustworthy because the autogenerated proof objects are checked using 
the trustworthy third-party Metamath proof checker. 


3 K Framework and Generation of Proof Parameters 


3.1 K Overview 


K is an effort in realizing the ideal language framework vision in Fig. 1. An easy 
way to understand K is to look at it as a meta-language that can define other 
programming languages. In Fig. 2, we show an example K language definition 
of an imperative language IMP. In the 39-line definition, we completely define 
the formal syntax and the (executable) formal semantics of IMP, using a front 
end language that is easy to understand. From this language definition, K can 
generate all language tools for IMP, including its parser, interpreter, verifier, etc. 

We use IMP as an example to illustrate the main K features. There are two mod- 
ules: IMP-SYNTAX defines the syntax and tmp defines the semantics using rewrite 
rules. Syntax is defined as BNF grammars. The keyword syntax leads production 
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rules that can have attributes that specify the additional syntactic and/or seman- 
tic information. For example, the syntax of if -statements is defined in lines 11-12 
and has the attribute [strict(1)] , meaning that the evaluation order is strict in 
the first argument, i.e., the condition of an if -statement. 

In the module Imp, we define the configurations of IMP and its formal 
semantics. A configuration (lines 23-25) is a constructor term that has all seman- 
tic information needed to execute programs. IMP configurations are simple, con- 
sisting of the IMP code and a program state that maps variables to values. We 
organize configurations using (semantic) cells: </k> is the cell of IMP code and 

</state> is the cell of program states. In the initial configuration (lines 24-25), 
</state> is empty and </k> contains the IMP program that we pass to K for 
execution (represented by the special K variable  $Pcm ). 

We define formal semantics using rewrite rules. In lines 26-27, we define the 
semantics of variable lookup, where we match on a variable x inthe </k> cell 
and look upits value 1 inthe </state> cell, by matching on the binding x 1. 
Then, we rewrite x to 1,denotedby x = 1 inthe </k> cellin line 26. Rewrite 
rules in K are similar to those in the rewrite engines such as Maude [7]. 


module TWO-COUNTERS 


A Running Example. IMP is too 1 À 
; 2 imports INT 
complex as a running example so we 3 syntax State ::= "<" Int "," Int ">" 
introduce a simpler one: TwO-COUNTERS . A configuration slp suai: States T 
7 s 5 rule <M, N> => <M -Int 1, N +Int M> 

Although simple, Two-counteRs still 6 requires M >Int @ 
uses the core features of defining for- E *endmodule 
mal syntax as grammars and formal 
semantics as rewrite rules. 

TWO-COUNTERS is a tiny language that defines a state machine with two coun- 
ters. Its computation configuration is simply a pair (m,n) of two integers m and 
n, and its semantics is defined by the following (conditional) rewrite rule: 


Fig. 3. Running example TwO-COUNTERS . 


(m,n) => (m—1,n+m) ifm>0 (3) 


Therefore, Two-counters adds n by m and reduces m by 1. Starting from the 
initial state (m,0), Two-couNTERS carries out m execution steps and terminates 
at the final state (0, m(m + 1)/2), where m(m + 1)/2 = m+ (m—1)+--- +1. 


3.2 Program Execution and Proof Parameters 


In the following, we show a concrete program execution trace of TwO-COUNTERS 
starting from the initial state (100, 0): 


(100, 0), (99, 100), (98, 199),..., (1, 5049), (0, 5050) (4) 


To make K generate the above execution trace, we need to follow these steps: 


1. Prepare the initial state (100,0) in a source file, say 100.two-counters . 
2. Compile the formal semantics Two-couNTERS into a matching logic theory, 
explained in Sect. 5. 
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3. Use the K execution tool krun and pass the source file to it: 
$ krun 100.two-counters --depth N 


The option --depth N tells K to execute for n steps and output the (interme- 
diate) snapshot. By letting n be 1, 2,..., we collect all snapshots in Eq. (4). 

The proof parameter of Eq. (4) includes the additional rewriting information 
for each execution step. That is, we need to know the rewrite rule that is applied 
and the corresponding substitution. In two-counters , there is only one rewrite 
rule, and the substitution can be easily obtained by pattern matching, where we 
simply match the snapshot with the left-hand side of the rewrite rule. 

Note that we regard K as a “black box”. We are not interested in its complex 
internal algorithms. Instead, we hide such complexity by letting K generate proof 
parameters that include enough information for proof object generation. This 
way, we create a separation of concerns between K and proof object generation. 
K can aim at optimizing the performance of the autogenerated language tools, 
without making proof object generation more complex. 


4 Matching Logic and Its Formalization 


We review the syntax and proof system of matching logic—the logical foundation 
of K. Then, we discuss its formalization, which is our main technical contribution 
and is a critical component of the proof objects we generate for K (see Sect. 2). 


4.1 Matching Logic Overview 


Matching logic was proposed in [23] as a means to specify and reason about 
programs compactly and modularly. The key concept is its formulas, called pat- 
terns, which are used to specify program syntax and semantics in a uniform way. 
Matching logic is known for its simplicity and rich expressiveness. In [4—6, 22], 
the authors developed matching logic theories that capture FOL, FOL-lfp, sepa- 
ration logic, modal logic, temporal logics, Hoare logic, A-calculus, type systems, 
etc. In Sect. 5, we discuss the matching logic theories that capture K. 

The syntax of matching logic is parametric in two sets of variables EV and 
SV. We call EV the set of element variables, denoted z,y,..., and SV the set 
of set variables, denoted X,Y,.... 


Definition 1. A (matching logic) signature X is a set of (constant) symbols. 
The set of X-patterns, denoted PATTERN(), is inductively defined as follows: 


gr=2|X lol gr pal ll ei > p2 | sey | uX. p 
where in X.p we require that y has no negative occurrences of X. 


Thus, element variables, set variables, and symbols are patterns. p1 Y2 is a 
pattern, called application, where the first argument is applied to the second. We 
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a[p/2] =o yly/z]=yify# a 

o|p/z] =ø pı > p2)[Y/2] = gl /x] > v2ly/a] 
L[y/e] = 1 pı va)lb/2] = (vilb/2]) (valb/2]) 
Ge.p)\¥/2]=3e.9 Gep = Sz. olz/2][0/s] for fresh z 
uX. ~)[b/a] = uZ. p|Z/X][W/z] for fresh Z 


( 
( 
( 
( 


Fig. 4. Capture-free substitution are defined in the usual way and formalized later in 
Sect. 4.2 as a part of our proof objects. 


have propositional connectives L and yı — y2, existential quantification dz. y, 
and the least fixpoints uX., from which the following notations are defined: 


wyp >L T==^œaL pı A p2 = 7(791 V 72) 
P1 V p2 =E= Y1 > po Vr. p = Jr. ny vX. = apX.7y|_7X/X] 


We use FV (y) to denote the free variables of y, and y[y/z] and y[y/X] to denote 
capture-free substitution. Their (usual) definitions are listed in Fig. 4. 

Matching logic has a pattern matching semantics, where a pattern y is inter- 
preted as the set of elements that match it. For example, y; A ¢ is the pattern 
that is matched by those matching both yı and %2. Matching logic semantics is 
not needed for proof object generation, so we exile it to [5,22]. 

We show the matching logic proof system in Fig.5, which defines the prov- 
ability relation, written IT’ F wy, meaning that y can be proved using the proof 
system, with patterns in I’ added as additional axioms. We call I" a match- 
ing logic theory. The proof system is a main component of proof objects. To 
understand it, we first need to define application contexts. 


Definition 2. A context is a pattern C with a hole variable O. We write Cly] = 
Cly/O] as the result of context plugging. We call C an application context, if 


1. C=U is the identity context; or 
2. C= C orC=C'»¢, where C" is an application context and O ¢ Fv(y). 


That is, the path from the root to O in C has only applications. 

The proof rules are sound and can be divided into 4 categories: FOL rea- 
soning, frame reasoning, fixpoint reasoning, and some technical rules. The FOL 
reasoning rules provide (complete) FOL reasoning (see, e.g., [25]). The frame 
reasoning rules state that application contexts are commutative with disjunctive 
connectives such as V and J. The fixpoint reasoning rules support the stan- 
dard fixpoint reasoning as in modal u-calculus [17]. The technical proof rules are 
needed for some completeness results (see [5] for details). 


4.2 Formalizing Matching Logic 


We discuss the formalization of matching logic, which is our first main contri- 
bution and forms an important component in our proof objects (see Sect. 2). 
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Propositional 1) y > (4 > y) 
Propositional 2) (y > ( > 8)) > ((p => 4) > (y > 8)) 
(Propositional 3) ((y > 1)>1)>¢ 
— 
FOL Modus Ponens) CE aa 
Rules 
3-Quantifier) yly/z] > 3x. p 
z gy 
3-Generalization) }+———————-« ¢ FV (4) 
(Gr. p) > Y 
Propagation, ) C[L] —> 1 
Propagationy ) Clo vy] > Cle] v Clu] 
Frame Propagation3) ClAa. py] > 3x. Chp] with x ¢ FV(C) 
Rules 
Framing) SEE ed 
Cle] > Ci] 
ARR A 
Substitution 
ARX] 
Fixpoint Prefixpoint) ol(uX.p)/X] > uX. gẹ 
Rules 
, e/X] >v 
Knaster-Tarski A a 
METEL 
Technical Existence) da. x 
Rules Singleton) A(Ci[z A gy] A Cale A 79]) 


Fig. 5. Matching logic proof system (where C, C1, C2 are application contexts). 


Metamath [20] is a tiny language to state abstract mathematics and their 
proofs in a machine-checkable style. In our work, we use Metamath to formalize 
matching logic and to encode our proof objects. We choose Metamath for its 
simplicity and fast proof checking: Metamath proof checkers are often hundreds 
lines of code and can proof-check thousands of theorems in a second. 

Our formalization follows closely Sect.4.1. We formalize the syntax of pat- 
terns and the proof system. We also need to formalize some metalevel operations 
such as free variables and capture-free substitution. An innovative contribution 
is a generic way to handling notations (such as ~ and ^) in matching logic. The 
resulting formalization has only 245 lines of code, which we show in [16]. This 
formalization of matching logic is the main trust base of our proof objects. 


Metamath Overview. We use an extract of our formalization of matching 
logic (Fig.6) to explain the basic concepts in Metamath. At a high level, a 
Metamath source file consists of a list of statements. The main ones are: 


1. constant statements ( $c ) that declare Metamath constants; 


486 X. Chen et al. 


1 $c \imp ( ) #Pattern |- $. 23 imp-refl $p |- ( \imp phl phl ) 
2 24 $= 

3 $v phl ph2 ph3 $. 25 phl-is-pattern phl-is-pattern 
4 phl-is-pattern $f #Pattern ph1 $. 26 phi 

5 ph2-is-pattern $f #Pattern ph2 $. 27 

6 ph3-is-pattern $f #Pattern ph3 $. 28 

7 imp-is-pattern 29 

8 $a #Pattern ( \imp phil ph2 ) $. 30 

9 31 

10 axiom-1 32 

11 $a |- ( \imp phl ( \imp ph2 phil ) ) $. 33 

12 34 

13 axiom-2 35 

14 $a |- ( \imp ( \imp phl ( \imp ph2 ph3 ) ) 36 imp-i 

15 ( \imp ( \imp phl ph2 ) 37 oh 

16 ( \imp phl ph3 ) ) ) $. 38 

17 39 

18 ${ 40 ph 

19 rule-mp.@ $e |- ( \imp phl ph2 ) $. 41 

20 rule-mp.1 $e |- phl $. 42 [i 

21 rule-mp $a |- ph2 $. 43 phl-is-pattern axiom-1 
22 $} 44 $. 


Fig. 6. An extract of the Metamath formalization of matching logic. 


2. variable statements ( $v) that declare Metamath variables, and floating 
statements ( $f ) that declare their intended ranges; 

3. axiomatic statements ( $a) that declare Metamath axioms, which can be 
associated with some essential statements ( $e ) that declare the premises; 

4. provable statements ( $p ) that states a Metamath theorem and its proof. 


Figure 6 defines the fragment of matching logic with only implications. We 
declare five constants in a row in line 1, where \imp, (, and ) build the 
syntax, #Pattern is the type of patterns, and |- is the provability relation. We 
declare three metavariables of patterns in lines 3-6, and the syntax of implication 
pı — 2 aS ( \imp phl ph2 ) inline 7. Then, we define matching logic proof rules 
as Metamath axioms. For example, lines 18-22 define the rule (Modus Ponens). 

In line 23, we show an example (meta-)theorem and its formal proof in Meta- 
math. The theorem states that F yı — pı holds, and its proof (lines 25-43) is 
a sequence of labels referring to the previous axiomatic/ provable statements. 

Metamath proofs are very easy to proof-check, which is why we use it in our 
work. The proof checker reads the labels in order and push them to a proof stack 
S, which is initially empty. When a label / is read, the checker pops its premise 
statements from S and pushes l itself. When all labels are consumed, the checker 
checks whether S has exactly one statement, which should be the original proof 
goal. If so, the proof is checked. Otherwise, it fails. 

As an example, we look at the first 5 labels of the proof in Fig. 6, line 25: 


// Initially, the proof stack S is empty 
phl-is-pattern // S = | #Pattern phi | 
phl-is-pattern // S = | #Pattern phl ; #Pattern phi | 
phl-is-pattern // S =| #Pattern phl ; #Pattern phl ; #Pattern phil | 
imp-is-pattern // S = | #Pattern phl ; #Pattern ( \imp phi phi ) | 
imp-is-pattern // S = | #Pattern ( \imp phl ( \imp phl phi ) ) | 
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where we show the stack status in comments. The first label  phi-is-pattern 
refers to a $f -statement without premises, so nothing is popped off, and the 
corresponding statement #Pattern ph1 is pushed to the stack. The same happens, 
for the second and third labels. The fourth label imp-is-pattern refers toa $a- 
statement with two metavariables of patterns, and thus has 2 premises. There- 
fore, the top two statements in S are popped off, and the corresponding con- 
clusion #Pattern ( \imp phl phl ) is pushed to S. The last label does the same, 
popping off two premises and pushing #Pattern ( \imp phl ( \imp phl phl ) ) to 
S. Thus, these five proof steps prove the wellformedness of yı > (Yı > 91). 


Formalizing Matching Logic Syntax. Now, we go through the formalization 
of matching logic and emphasize some highlights. See [5,6,22] for full detail. 
The syntax of patterns is formalized below, following Definition 1: 


$c \bot \imp \app \exists \mu ( ) $. 


var-is-pattern $a #Pattern xX $. 
symbol-is-pattern $a #Pattern sgO $. 
bot-is-pattern $a #Pattern \bot $. 
imp-is-pattern $a #Pattern ( \imp phO phil ) $. 
app-is-pattern $a #Pattern ( \app phO phil ) $. 


exists-is-pattern $a #Pattern ( \exists x phO ) $. 
${ mu-is-pattern.0 $e #Positive X phO $. 
mu-is-pattern $a #Pattern ( \mu X phO ) $. $} 


Note that we omit the declarations of metavariables (such as xx, sg0,...) 
because their meaning can be easily inferred. The only nontrivial case above is 
mu-is-pattern , where we require that pho is positive in x, discussed below. 


Metalevel Assertions. To formalize matching logic, we need the following 
metalevel operations and/or assertions: 


positive (and negative) occurrences of variables; 
free variables; 

capture-free substitution; 

application contexts; 

notations. 


Oe wh Pe 


Item 1 is needed to define the syntax of wX.y, while Items 2-5 are needed 
to define the proof system (Fig.5). Here, we show how to define capture-free 
substitution as an example. Notations are discussed in the next section. 

To formalize capture-free substitution, we first define a Metamath constant 


$c #Substitution $. 


that serves as an assertion symbol: #Substitution ph ph’ ph” xX holds iff ph = 

ph’ | ph” / xx]. Then, we can define substitution following Fig. 4. The only 
nontrivial case is when ph’ is Jx. or wX.y, in which case a-renaming is 
required to avoid variable capture. We show the case when ph’ is Jz. below: 
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substitution-exists -shadowed 
$a #Substitution ( \exists x phl ) ( \exists x phl ) phO x $. 
${ $d xX x $. 
$d y pho $. 
substitution-exists.0 $e #Substitution ph2 phl y x $. 
substitution-exists.1 $e #Substitution ph3 ph2 phO xX $. 
substitution-exists 
$a #Substitution ( \exists y ph3 ) ( \exists x phl ) phO xX $. $} 


There are two cases, as expected from Fig.4. substitution-exists-shadowed is 
when the substitution is shadowed. substitution-exists is the general case, where 
we first rename x toa fresh variable y and then continue the substitution. 
The $d -statements state that the substitution is not shadowed and y is fresh. 


Supporting Notations. Notations (e.g., = and A) play an important role 
in matching logic. Many proof rules such as (Propagationy) and (Singleton) use 
notations (see Fig. 5). However, Metamath has no built-in support for notations. 
To define a notation, say ~y = y — L, we need to (1) declare a constant \not 
and add it to the pattern syntax; (2) define the equivalence relation ~y = y — L; 
and (3) add a new case for \not to every metalevel assertions. While (1) and (2) 
are reasonable, we want to avoid (3) because there are many metalevel assertions 
and thus it creates duplication. 
Therefore, we implement an innovative and generic method that allows us to 
define any notations in a compact way. Our method is to declare a new constant 
#Notation and use it to capture the congruence relation of sugaring/desugaring. 
Using #Notation , it takes only three lines to define the notation ay = y —> L: 


$c \not $. 
not-is-pattern $a #Pattern ( \not phO ) $. 
not-is-sugar $a #Notation ( \not phO ) ( \imp phO \bot ) $. 

To make the above work, we need to state that #Notation is a congruence 
relation with respect to the syntax of patterns and all the other metalevel asser- 
tions. Firstly, we state that it is reflexive, symmetric, and transitive: 


notation-reflexivity $a #Notation phO phO $. 

${ notation-symmetry.0 $e #Notation phO phil $. 
notation-symmetry $a #Notation phl phO $. $} 

${ notation-transitivity.0 $e #Notation phO phil $. 
notation-transitivity.1 $e #Notation phl ph2 $. 
notation-transitivity $a #Notation phO ph2 $. $} 


And the following is an example where we state that #Notation is a congruence 
with respect to provability: 


${ notation-provability.0 $e #Notation phO phil $. 
notation-provability.1 $e |- phO $. 
notation-provability $a |- phl $. $} 
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This way, we only need a fixed number of statements that state that #Notation 
is a congruence, making it more compact and less duplicated to define notations. 


Formalizing Proof System. With metalevel assertions and notations, it is 
now straightforward to formalize matching logic proof rules. We have seen the 
formalization of (Modus Ponens) in Fig. 6. In the following, we formalize the fix- 
point proof rule (Kanaster-Tarski), whose premises use capture-free substitution: 


${ rule-kt.0 $e #Substitution phO phl ph2 X $. 
rule-kt.1 $e |- ( \imp phO ph2 ) $. 
rule-kt $a |- ( \imp ( \mu X phl ) ph2 ) $. $} 


5 Compiling K into Matching Logic 


To execute programs using K, we need to compile the K language definition 
for language L into a matching logic theory, written I” (see Sect. 3.2). In this 
section, we discuss this compilation process and show how to formalize T4. 


5.1 Basic Matching Logic Theories 


Firstly, we discuss the basic matching logic theories that are required by I’. We 
discuss the theories of equality, sorts (and sorted functions), and rewriting. 


Theory of Equality. By equality, we mean a (predicate) pattern Yı = 2 that 
holds (i.e., equals to T) iff pı equals to y2, and fails (i.e., equals to L) otherwise. 
We first need to define definedness [p], which is a predicate pattern that states 
that ọ is defined, i.e., p is matched by at least one element: y is not L. 


Definition 3. Consider a symbol [_| € X, called the definedness symbol. We 
write |p] for the application [_]| p. In addition, we define the following axiom: 


(Definedness) [æ] (5) 


(Definedness) states that any element x is defined. Using the definedness sym- 
bol, we can define many important mathematical instruments, including equality, 
as the following notations: 


ly] = -[-¢! // Totality 1 = p2 = [p1 > p2] // Equality 
pı E p2 = |Y1 > p2] // Inclusion re p=[rAyl // Membership 


[22, Section 5.1] shows that the above indeed capture the intended semantics. 


490 X. Chen et al. 


Theory of Sorts. Matching logic is not sorted, but K is. To compile K into 
matching logic, we need a systematic way to dealing with sorts. We follow the 
“sort-as-predicate” paradigm to handle sorts and sorted functions in matching 
logic, following [4,6]. The main idea is to define a symbol |_] € X, called 
the inhabitant symbol, and use the inhabitant pattern [s] (abbreviated for the 
application |_]| s) to represent the inhabitant set of sort s. For example, to define 
a sort Nat, we define a corresponding symbol Nat that represents the sort name, 
and use [Nat] to represent the set of all natural numbers. 

Sorted functions can be axiomatized as special matching logic symbols. For 
example, the successor function succ of natural numbers is a symbol with axiom: 


Va.a € [Nat] > dy.y € [Nat] A succ £ = y (6) 


In other words, for any x in the inhabitant set of Nat, there exists a y in the 
inhabitant set of Nat such that succ x equals to y. Thus, succ is a sorted function 
from Nat to Nat. 


Theory of Rewriting. Recall that in K, the formal language semantics is 
defined using rewrite rules, which essentially define a transition system over 
computation configurations. In matching logic, a transition system can be cap- 
tured by only one symbol èe € X, called one-path next, with the intuition that 
for any configuration y, ey is matched by all configurations that can go to y in 
one step. In other words, y is reached on one-path in the next configuration. 

Program execution is the reflexive and transitive closure of one-path next. 
Formally, we define program execution (i.e., rewriting) as follows: 


op = pX.pVex // Eventually; equals to pV ey V eey V... 
Pl > p2 = Y1 `> P2 // Rewriting 


5.2 Kore: The Intermediate Between K and Matching Logic 


The K compilation tool kompile (explained shortly) is what compiles a K lan- 
guage definition into a matching logic theory I”, written in a formal language 
called Kore. For legacy reasons, the Kore language is not the same as the syntax 
of matching logic (Definition 1), but an axiomatic extension with equality, sorts, 
and rewriting. Thus, to formalize T4 in proof objects, we need to (1) formalize 
the matching logic theories of equality, sorts, and rewriting; and (2) automati- 
cally translate Kore definitions into the corresponding matching logic theories. 
Figure 7 shows the 2-phase translation from K to matching logic, via Kore. 


Phase 1: From K to Kore. To compile a K definition such as two-counters.k in 
Fig. 3, we pass it to the K compilation tool kompile as follows: 
$ kompile two-counters.k 
The result is a compiled Kore definition two-counters.kore . We show the auto- 
generated Kore axiom in Fig. 7 that corresponds to the rewrite rule in Eq. (3). As 
we can see, Kore is a much lower-level language than K, where the programming 
language concrete syntax and K’s front end syntax are parsed and replaced by 
the abstract syntax trees, represented by the constructor terms. 
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The K Compilation Tool An automatic encoder from 
kompile Kore to matching logic 
ni P Matching Logic Theo: 
K Definition — Kore Definition c—> me -08 y 
(formalized in Metamath) 
rule <M, N> axiom \rewrites ( $a |- ( \rewrites 
=> <M -Int 1, N +Int M> \and(\pair(M, N), \gte(M, 0)), ( \and ( \pair MN) ( \gte MO) ) 
requires M >Int 0 \pair(\minus(M, 1), \plus(N, M))) ( \pair ( \minus M1) ( \plus NM) ) ) $. 


Fig. 7. Automatic translation from K to matching logic, via Kore 


Phase 2: From Kore to Matching Logic. We develop an automatic encoder that 
translates Kore syntax into matching logic patterns. Since Kore is essentially the 
theory of equality, sorts, and rewriting, we can define the syntactic constructs 
of the Kore language as notations, using the basic theories in Sect. 5.1. 


6 Generating Proof Objects for Program Execution 


In this section, we discuss how to generate proof objects for program execution, 
based on the formalization of matching logic and K/Kore in Sects.4 and 5. 
The key step is to generate proof objects for one-step executions, which are 
then put together to build the proof objects for multi-step executions using the 
transitivity of the rewriting relation. Thus, we focus on the process of generating 
proof objects for one-step executions from the proof parameters provided by K. 


6.1 Problem Formulation 
Consider the following K definition that consists of K (conditional) rewrite rules: 
S = {tk Apk => sk | k=1,2,..., K} 


where tę and sx are the left- and right-hand sides of the rewrite rule, respectively, 
and px is the rewriting condition. Consider the following execution trace: 


P0, Pls- -+3 Pn (7) 
where Yo,.--, Pn are snapshots. We let K generate the following proof parameter: 
O= (ko, 00), ---; (Kn—1, On—-1) (8) 


where for each 0 < i < n, ki denotes the rewrite rule that is applied on y; 
(1 < ki < K) and 6; denotes the corresponding substitution such that t,,0; = yi. 
As an example, the rewrite rule of Two-counTERS , restated below: 


(m,n) > (m—1,n+m) ifm>0 // Same as Eq. (3) 


has the left-hand side t = (m,n), the right-hand side sp = (m—1,n+m), and 
the condition pp = m > 0. Note that the right-hand side pattern są contains 
the arithmetic operations “+” and “—” that can be further evaluated to a value, 
if concrete instances of the variables m and n are given. Generally speaking, 
the right-hand side of a rewrite rule may include (built-in or user-defined) func- 
tions that are not constructors and thus can be further evaluated. We call such 
evaluation process a simplification. 
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6.2 Applying Rewrite Rules and Applying Simplifications 


In the following, we list all proof objects for one-step executions. 


T} Po => Skalo // by applying tko ^ Pko > Sko using 0o 
D F 84,00 = y1 // by simplifying Sko 


rer Qn—-1 => Sk,_,9n-1 // by applying tr, _, A Pk,_1 > Sk,_, USİNg An—1 


r*+ Sbai n-i = Yn // by simplifying Sk„_ı0n-1 


As we can see, there are two types of proof objects: one that proves the results 
of applying rewrite rules and one that applies simplification. 


Applying Rewrite Rules. The main steps in proving T + yi > sk,0i are 
(1) to instantiate the rewrite rule tk; A Pk; > Sk; using the substitution 


6; = [e1/21,---,Cm/Lm 


given in the proof parameter, and (2) to show that the (instantiated) rewriting 
condition px, 6; holds. Here, 71,..., 2m are the variables that occur in the rewrite 
rule and c,,...,Cm are terms by which we instantiate the variables. For (1), we 
need to first prove the following lemma, called (Functional Substitution) in [5], which 
states that V-quantification can be instantiated by functional patterns: 


VE. thy N Pki => Ski Jy. pı = Yt + Tym. Ym = Ym 
th, 9: A Dr, Gi > SK, 4 


Y1,--+;Ym fresh 


Intuitively, the premise dy;.y~; = yı states that yı is a functional pattern 
because it equals to some element y1. 

If O in Eq. (8) is the correct proof parameter, 6; is the correct substitution 
and thus t,i = y;. Therefore, to prove the original proof goal for one-step 
execution, i.e. T} + pi => sx,0;, we only need to prove that T4 + pz, 6;, i.e., the 
rewriting condition pz, holds under 6;. This is done by simplifying Pk;0i to T, 
discussed together with the simplification process in the following. 


Applying Simplifications. K carries out simplification exhaustively before 
trying to apply a rewrite rule, and simplifications are done by applying (oriented) 
equations. Generally speaking, let s be a functional pattern and p —> t = t be 
a (conditional) equation, we say that s can be simplified w.r.t. p> t = t', if 
there is a sub-pattern so of s (written s = C[so] where C is a context) and a 
substitution 0 such that so = t0 and p0 holds. The resulting simplified pattern is 
denoted C{t’@|. Therefore, a proof object of the above simplification consists of 
two proofs: TŁ + s = C|t'0] and T% | p0. The latter can be handled recursively, 
by simplifying p to T, so we only need to consider the former. 

The main steps of proving T% + s = C[t’6] are the following: 
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1. to find C, so, 0, and t = t' in I” such that s = C[so] and so = t0; in other 
words, s can be simplified w.r.t. t = t’ at the sub-pattern so; 

2. to prove I’ F so = t'O by instantiating t = t/ using the substitution 0, using 
the same (Functional Substitution) lemma as above; 

3. to prove T% + C[so] = C{t’] using the transitivity of equality. 


Finally, we repeat the above one-step simplifications until no sub-patterns 
can be simplified further. The resulting proof objects are then put together by 
the transitivity of equality. 


7 Discussion on Implementation 


As discussed in Sect.2, a complete proof object for program execution (i.e., 
TE pint > final) Consists of (1) the formalization of matching logic and its 
basic theories; (2) the formalization of ”; and (3) the proofs of one-step and 
multi-step program executions. In our implementation, (1) is developed manually 
because it is fixed for all programming languages and program executions. (2) 
and (3) are automatically generated by the algorithms in Sect. 6. 

During the (manual) development of (1), we needed to prove many basic 
matching logic (meta-)theorems as lemmas, such as (Functional Substitution) in 
Sect. 6.2. To ease the manual work, we developed an interactive theorem prover 
(ITP) for matching logic, which allows us to carry out higher-level interactive 
proofs that are later automatically translated into the lower-level Metamath 
proofs. We show the highlights of our ITP for matching logic in Sect. 7.1. 

In Sect. 7.2, we discuss the main limitations of our current preliminary imple- 
mentation. These limitations are planned to be addressed in future work. 


7.1 An Interactive Theorem Prover for Matching Logic 


Metamath proofs are low-level and not human readable (see, e.g., the proof of 
F y > ọ in Fig. 6). Metamath has its own interactive theorem prover (ITP), but 
it is for general purposes and does not have specific support for matching logic. 
Therefore, we developed a new ITP for matching logic that has the following 
characteristic features: 


— Our ITP understands the syntax of matching logic patterns and has proof 
tactics to desugar notations in the proof goals; 

— Our ITP has an automatic proof tactic for propositional tautologies, based 
on the resolution method; 

— Our ITP allows dynamic proofs, meaning that new lemmas can be dynami- 
cally added during an interactive proof; this makes our ITP easier to use. 


When an interactive proof is finished, our ITP will translate the higher-level 
proof tactics into real Metamath formal proofs, and thus ease the manual devel- 
opment. It is not our interest to fully introduce ITP in this paper, as more detail 
about the ITP is to be found in future publications. 
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7.2 Limitations and Threats to Validity 


We discuss the trust base of the autogenerated proof objects by pointing out the 
main threats to validity, caused by the limitations of our preliminary implemen- 
tation. It should be noted that these limitations are about the implementation, 
and not our approach. We shall address these limitations in future work. 


Limitation 1: Need to Trust Kore. Our current implementation is based on 
the existing K compilation tool kompile that compiles K into Kore definitions. 
Recall that Kore is a (legacy) formal language with built-in support for equality, 
sorts, and rewriting, and thus is different (and more complex) than the syntax 
of matching logic. By using Kore as the intermediate between K and matching 
logic (Fig. 7), we need to trust Kore and the K complication tool kompile . 
In the future, we will eliminate Kore entirely from the picture and formalize 
K directly. To do that, we need to formalize the “front end matters” of K, such as 
concrete programming language syntax and K attributes, currently handled by 
kompile . That is, we need to formalize and generate proof objects for kompile . 


Limitation 2: Need to Trust Domain Reasoning. K has built-in support 
for domain reasoning such as integer arithmetic. Our current proof objects do 
not include the formal proofs of such domain reasoning, but instead regard them 
as assumed lemmas. In the future, we will incorporate the existing research on 
generating proof objects for SMT solvers [1] into our implementation, in order 
to generate proof objects also for domain reasoning; see also Sect. 9. 


Limitation 3: Do Not Support More Complex Kfeatures. Our current 
implementation only supports the core K features of defining programming lan- 
guage syntax and of defining formal semantics as rewrite rules. Some more com- 
plex features are not supported; the main ones are (1) the [strict] attributes 
that specify evaluation orders; and (2) the use of built-in collection datatypes, 
such as lists, sets, and maps. 

To support (1), we should handle the so-called heating/cooling rules that are 
autogenerated rewrite rules that implement the specified evaluation orders. Our 
current implementation does not support these heating/cooling rules because 
they are conditional rules, and their conditions are those that state that an 
element is not a computation result. To prove such conditions, we need additional 
constructors axioms for the sorts/types that represent results of computation. To 
support (2), we should extend our algorithms in Sect. 6 with unification modulo 
these collection datatypes. 


8 Evaluation 


In this section, we evaluate the performance of our implementation and discuss 
the experiment results, summarized in Table 1. We use two sets of benchmarks. 
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Table 1. Performance of proof generation/checking (time measured in seconds). 


Programs Proof generation Proof checking Proof size 
Sem Rewrite Total | Logic| Task | Total kLOC MB 
10.two-counters |5.95 | 12.19 18.13 |3.26 |0.19 |3.44 |963.8 77 
20.two-counters 6.31 | 24.33 30.65 |3.41 |0.38 |3.79 1036.5 83 
50.two-counters 6.48 |73.09 79.57 |3.52 |0.98 |4.50 1259.2 100 
100.two-counters 6.75 177.55 | 184.30|3.50 |2.10 /5.60 1635.6 130 


add8 11.59 153.34 | 164.92) 3.40 |3.09 |6.48 1986.8 159 
factorial 3.84 34.63 | 38.46 |3.57 |0.90 |4.47 (1217.9 97 
fibonacci 4.50 12.51 17.01 |3.44 |0.21 |3.65 |971.7 77 
benchexpr 8.41 53.22 (61.62 |3.61 |0.80 |4.41 1191.3 95 
benchsym 8.79 47.71 56.50 |3.53 |0.72 |4.25 1163.4 93 
benchtree 8.80 26.86 35.66 (3.47 |0.32 |3.80 1021.5) 81 
Langton 5.26 23.07 | 28.33 |3.46 |0.40 |3.86 1048.0 84 
mul8 14.39 279.97 | 294.36 | 3.48 |7.18 |10.66 3499.2 280 
revelt 4.98 51.83 |56.81 |3.35 |1.10 |4.45 1317.4 105 
revnat 4.81 123.44 |128.25 | 3.37 |5.28 |8.65 2691.9 215 


tautologyhard 5.16 400.89 406.05 |3.55 | 14.50| 18.04 6884.7 550 


The first is our running example two-counters with different inputs (10, 20, 50, 

and 100). The second is REC [11], which is a popular performance benchmark for 

rewriting engines. We evaluate both the performance of proof object generation 

and that of proof checking. Our implementation can be found in [16] and [3]. 
The main takeaways of our experiments are: 


1. Proof checking is efficient and takes a few seconds; in particular, the task- 
specific checking time is often less than one second (“task” column in Table 1). 

2. Proof object generation is slower and takes several minutes. 

3. Proof objects are huge, often of millions LOC (wrapped at 80 characters). 


Proof Object Generation. We measure the proof object generation time as 
the time to generate complete proof objects following the algorithms in Sect. 6, 
from the compiled language semantics (i.e., Kore definitions) and proof parame- 
ters. As shown in Table 1, proof generation takes around 17—406 s on the bench- 
marks, and the average is 107s. 

Proof object generation can be divided into two parts: that of the language 
semantics I’ and that of the (one-step and multi-step) program executions. 
Both parts are shown in Table 1 under columns “sem” and “rewrite”, respectively. 
For the same language, the time to generate language semantics I’ is the same 
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(up to experimental error). The time for executions is linear to the number of 
steps. 


Proof Checking. Proof checking is efficient and takes a few seconds on our 
benchmarks. We can divide the proof checking time into two parts: that of the 
logical foundation and that of the actual program execution tasks. Both parts 
are shown in Table 1 under columns “logic” and “task”. The “logic” part includes 
formalization of matching logic and its basic theories, and thus is fixed for any 
programming language and program and has the same proof checking time (up to 
experimental error). The “task” part includes the language semantics and proof 
objects for the one-step and multi-step executions. Therefore, the time to check 
the “task” part is a more valuable and realistic measure, and according to our 
experiments, it is often less than 1s, making it acceptable in practice. 

As a pleasant surprise, the time for “task-specific’proof checking is roughly 
the same as the time that it takes K to parse and execute the programs. In other 
words, there is no significant performance difference on our benchmarks between 
running the programs directly in K and checking the proof objects. 

There exists much potential to optimize the performance of proof check- 
ing and make it even faster than program execution. For example, in our app- 
roach proof checking is an embarrassingly parallel problem, because each meta- 
theorems can be proof-checked entirely independently. Therefore, we can signif- 
icantly reduce the proof checking time by running multiple checkers in parallel. 


9 Related Work 


The idea of using proof generation to address the functional correctness of com- 
plicated systems has been introduced a long time ago. 

Interactive theorem provers such as Coq [19] and Isabelle [26] are often used 
to formalize programming language semantics and to reason about program 
properties. These provers often provide a high-level proof script language that 
allows the users to develop human-readable proofs, which are then automatically 
translated into lower-level proof objects that can be checked by the corresponding 
proof checkers. For example, the proof objects of Coq are of the form t : t’, where 
t’ is aterm that represents the proposition to be proved and t’ represents a formal 
proof. The typing claim t : t’ can then be proof-checked by a proof checker that 
implements the typing rules of the calculus of inductive constructions (CIC) [8], 
which is the logical foundation of Coq. 

There are two main differences between provers such as Coq and our tech- 
nique. Firstly, Coq is not regarded as a language framework in the sense of Fig. 1 
because no language tools are autogenerated from the formal semantics. In our 
case, we need to be able to handle the correctness of individual tasks on a case- 
by-case basis to reduce the complexity. Secondly, Coq proof checking is based on 
CIC, which is arguably more complex than matching logic—the logical founda- 
tion of K as demonstrated in this paper. Indeed, the formalization of matching 
logic requires only 245 LOC which we display entirely in [16]. 
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Another application of proof generation is to ensure the correctness of SMT 
solvers. These are popular tools to check the satisfiability of FOL formulas, writ- 
ten in a formal language containing interpreted functions and predicates. SMT 
solvers often implement complex data structures and algorithms, putting their 
correctness at risk. There is recent work such as [1] studying proof generation 
for SMT solvers. The research has been incorporated in theorem provers such as 
Lean, which attempts to bridge the gap between SMT reasoning and proof assis- 
tants more directly by building a proof assistant with efficient and sophisticated 
built-in SMT capabilities. As discussed in Sect.7, our current implementation 
does not generate proofs for domain reasoning. So, we plan to incorporate the 
above SMT proof generation work into our future implementation. 


10 Conclusion 


We propose an innovative approach based on proof generation. The key idea is 
to generate proof objects as proof certificates for each individual task that the 
language tools conduct, on a case-by-case basis. This way, we avoid formally 
verifying the entire framework, which is practically impossible, and thus can 
make the language framework both practical and trustworthy. 
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Abstract. Explainability is the process of linking part of the inputs 
given to a calculation to its output, in such a way that the selected 
inputs somehow “cause” the result. We establish the formal foundations 
of a notion of explainability for arbitrary abstract functions manipulating 
nested data structures. We then establish explanation relationships for 
a set of elementary functions, and for compositions thereof. A fully func- 
tional implementation of these concepts is finally presented and experi- 
mentally evaluated. 


1 Introduction 


Developers of information systems in all disciplines are facing increasing pres- 
sure to come up with mechanisms to describe how or why a specific result is 
produced—a concept called explainability. For example, a web application test- 
ing tool that discovers a layout bug can be asked to pinpoint the elements of 
the page actually responsible for the bug [24]. A process mining system finding 
a compliance violation inside a business process log can extract a subset of the 
log’s sequence of events that causes the violation [46]. Similarly, an event stream 
processing system can monitor the state of a server room and, when raising an 
alarm, identify what machines in the room are the cause of the alarm [40]. All 
these situations have in common that one is not only interested in an oracle 
that produces a simple Boolean pass/fail verdict from a given input, but also 
additional information that somehow links parts of this input to the result. 

Explainability is currently handled by ad hoc means, if at all. Hence, a devel- 
oper may write a script that checks a complex condition on some input object; 
however, for this script to provide an explanation, and not just a verdict, extra 
code must be written in order to identify, organize, and format the relevant input 
elements that form an explanation for the result. This extra code is undesirable: 
it represents additional work, is specific to the condition being evaluated, and 
relies completely on the developer’s intuition as to what constitutes a suitable 
“explanation”. Better yet would be a formal framework where this notion would 
be defined for arbitrary abstract functions, and accompanied by a generic and 
systematic way of constructing an explanation. 
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In this paper, we present the theoretical foundations of a notion of explain- 
ability. Our model focuses on abstract functions whose input arguments and 
output values can be composite objects—that is, data structures that may be 
composed of multiple parts, such as lists. In contrast with existing works on the 
subject, which consider relationships between inputs and outputs as a whole, our 
framework is fine-grained: it is possible to point to a specific part of an output 
result, and construct an explanation that refers to precise parts of the input. 

Figure 1 illustrates the approach on a simple example. On the left is an input, 
which in this case is a text log of comma-separated values. Suppose that on this 
log, one wishes to extract the second element of each line, and check that the 
average of any three successive values is always greater than 3. It is easy to see 
that the condition is false, but where in the input log are the locations that 
cause it to be violated? By applying the systematic mechanism described in this 
paper, one can construct an explanation that is represented by the graph at the 
right. We can observe that the false output of the condition (the graph’s root) is 
linked to several leaves that designate parts of the input (character locations in 
each line of the file, identified by colors). Moreover, the graph contains Boolean 
nodes, indicating that the explanation may involve multiple elements (“and”), 
and that alternate explanations are possible (“or”). 


the,2,penny false 
fool,7,lane ©. 
on,18,come 

the,2,together A DS 


hill, -80,i A 
strawberry,7,am pa 
fields,1,the 


forever,10,walrus L3 C4-5 L4 C5 L5 C6-8 | | L6 C12 L7 C8 


Fig. 1. Left: a simple text file. Right: an explanation graph obtained from the evalua- 
tion of a function on this input. 


First, in Sect. 2, we review existing works related to the concept of causality, 
provenance and taint propagation, which are the notions closest to our concerns. 
Section 3 lays out the theoretical foundations of our framework: it introduces 
the notion of parts of composite objects, and formally defines a relation between 
parts of inputs and outputs of an arbitrary function, called explanation. More- 
over, it shows how an explanation can be constructed for a function that is 
a composition of basic functions, by composing their respective explanations. 
Section 4 then illustrates these definitions by demonstrating what constitutes 
an explanation on a small yet powerful set of basic functions. Taken together, 
these results make it possible to easily construct explanations for a wide range 
of computations. 

To showcase the feasibility of this approach, Sect. 5 presents Petit Poucet, a 
fully-functioning Java library that implements these concepts. The library allows 
users to create their own complex functions by composing built-in primitives, 
and can automatically produce an explanation for any result computed by these 
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functions on a given input. Experiments on a few examples provide insight on 
the time and memory overhead incurred by the handling of explanations. Finally, 
Sect. 6 identifies a number of exciting open theoretical questions that arise from 
our formal framework. 


2 Related Work 


Explainability can be seen as a particular case of a more general concept called 
lineage, where inputs and outputs of a calculation are put in relation according 
to various definitions. Related works around this notion can be separated into a 
few categories, which we briefly describe below. 


2.1 Causality 


In the field of testing and formal verification of software systems, lineage often 
takes the form of causality. A classical definition is given by Halpern and Pearl 
[28], based on what is called “counterfactual dependence”: A is a cause of B if the 
absence of A entails the absence of B. According to this principle, some feature 
of an object causes the failure of a verification or testing procedure if the absence 
of this feature instead causes the procedure to emit a passing verdict. This notion 
can be transposed for various types of systems. When a condition is expressed as 
a propositional logic formula, the cause of the true or false value of the formula 
can be constructed in the form of an explanatory sub-model; informally, it can 
be thought of as the smallest set of propositional variables whose value implies 
the value of the formula. 

In the case of conditions expressed on state-based systems, the cause of a vio- 
lation has been taken either as the shortest prefix that guarantees the violation 
of the property regardless of what follows [23], or as a minimal set of word-level 
predicates extracted from a failed execution [50]. A platform for hardware for- 
mal verification, called RuleBasePE, expands on this latter definition to identify 
components of a system that are responsible for the violation of a safety property 
on a single execution trace [5]. 

Causality has been criticized as an all-or-nothing notion; responsibility has 
been introduced a refinement on this concept, where the involvement of some 
element A as the cause of B can be quantified [12]. The problem of deciding 
causality also has a high computational complexity; as a matter of fact, deter- 
mining if A is a cause of B in a model only allowing Boolean values to variables 
is already NP-complete [18]. Tight automata [31,43] have also been developed 
to produce minimal counter-examples, which in this case, are sequences of states 
produced by a traversal of the finite-state machine. Explanatory sub-models can 
also be extended to the temporal case [19]. Another approach involves the com- 
putation of a so-called “minimal debugging window”, which is a small segment 
of the input trace that contains the discovered violation [36]. 

Finally, distance-based approaches compare a faulty trace with the closest 
valid trace (according to a given distance metric) [22]; the differences between 
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the two traces are defined as the cause of the failure. A similar principle is 
applied in a software testing technique called delta debugging to identify values 
of variables in a program that are responsible for a failure [15]. 


2.2 Provenance in Information Systems 


In a completely different direction, a large amount of work on lineage has been 
done in the field of databases, where this notion is often called provenance. A 
thorough survey of related approaches on provenance [9] reveals that this concept 
has been studied in several different areas of data management, such as scientific 
data processing and database management systems. 

Research in this field typically distinguishes between three types of prove- 
nance. The first type is called why-provenance [17]: to each tuple t in the output 
of a relational database query, why-provenance associates a set of tuples present 
in the input of the query that helped to “produce” t. How-provenance, as its 
name implies, keeps track not only of what input tuples contribute to the input, 
but also in which way these tuples have been combined to form the result [21]. 
For example, a symbolic polynomial like t? + t- t’ indicates that an output tuple 
can be explained in two alternative ways: either by using tuple t twice, or by 
combining t and t. Finally, where-provenance describes where a piece of data is 
copied from [8]. It is typically expressed at a finer level of granularity, by allow- 
ing to link individual values inside an output tuple to individual values of one 
or more input tuple. One possible way of doing it is through a technique called 
annotation-propagation, where each part of the input is given symbolic “anno- 
tations”, which are then percolated and tracked all the way to the output [7]. 

A recent survey reveals the existence of more than two dozen provenance- 
aware systems [38]. Where-provenance has been implemented into Polygen [51], 
DBNotes [11], MONDRIAN [20], MXQL [48] and ORCHESTRA [29]. The SPIDER 
system performs a slightly different task, by showing to a user the “route” from 
input to output that is being taken by data when a specific database query is 
executed [10]. The foundations for all these systems are relational databases, 
where sets of tuples are manipulated by operators from relational algebra, or 
extensions of SQL. 

Taken in a broad sense, we also list in this category various works that aim 
to develop explainable Artificial Intelligence [42]. Models used in AI vary widely, 
ranging from deep neural networks to Bayesian rules and decision trees; conse- 
quently, the notion of what constitutes an explanation for each of them is also 
very variable, and is at times only informally stated. 


2.3 Taint Analysis and Information Flow 


A last line of works this time considers the linkage between the inputs and the 
outputs produced by a piece of procedural code, mostly for considerations of 
security. Dynamic taint analysis consists in marking and tracking certain data 
in a program at run-time. A typical use case for taint analysis is to check whether 
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sensitive information (such as a password) is being leaked into an unprotected 
memory location, or if a “tainted” piece of input such as a user-provided string 
is being passed to a function like a database query without having been sanitized 
first (opening the door to injection attacks). 

In this category, TaintCheck has been developed into a system where each 
memory byte is associated with a 4-byte pointer to a taint data structure [37]; 
program inputs are marked as tainted, and the system propagates taint markers 
to other memory locations during the execution of a program; this concept has 
been extended to the operating system as a whole in an implementation called 
Asbestos [47]. Hardware implementations of this principle have also been pro- 
posed [16,44]. Dytan [14] is a notable system for taint propagation and analysis 
on programs written in assembly language. We shall also mention the GIFT sys- 
tem, as well as a compiler based on it called Aussum [32]. On its side, RIFLE 
focuses on the information flow [45], while TaintBochs is a system that has been 
used to track the lifetime of sensitive data inside the memory of a program [13]. 

The capability of following taint markings on the inputs of a program can be 
used in many ways. For example, a system called COMET uses taint propagation 
to improve the coverage of an existing test suite [33]. Taint analysis can also be 
used to quantify the amount of information leak in a system [34]. Stated simply, 
the propagation of taint markings can be seen as a form of how-provenance, 
applied to variables and procedural code instead of tuples and relational queries. 
Note however that it operates in a top-down fashion: mark the inputs, and track 
these markings on their way to the output. In contrast, we shall see that our 
notion of explanation is bottom-up: point at a part of the output, and retrace 
the parts of the input that are related to it. 


3 A Formal Definition of Explanation 


The problem we consider can be simply stated. Given an abstract function f and 
an input argument x, establish a formal relationship that explains f(x) based 
on x. While this is more or less closely related to the works we presented in the 
previous section, the solution we propose has a few distinguishing features. 

First, x and f(x) may be composite objects made of multiple “parts”, and 
it is possible to relate specific parts of an output to specific parts of an input. 
Second, if f is itself a composition g o h, it is possible to construct the input- 
output relationships of f from the individual input-output relationships of g 
and h. Third, these relations are not defined ad hoc for each individual function, 
but come as consequences of a general definition. Finally, given x and f(z), 
determining the parts of x in relation with a given part of f(z) is tractable. 

In this section, we establish the formal foundations of our proposed approach. 
We start by defining an abstract “part-of” relation between abstract objects, 
based on the notion of designator, and establish some properties of this relation. 
We then propose a definition of explanation for arbitrary functions, and discuss 
how it differs from existing causality and lineage relationships mentioned earlier. 
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3.1 Object Parts 


Let U = |J; O; be a union of sets of objects; the sets O; are called types. We sup- 
pose that M contains a special object, noted Ø, that represents “nothing”. Types 
are disjoint sets, with the exception of M which, for convenience, is assumed to 
be part of every type. 

Each type O is associated with a set Mo, whose elements are called parts. The 
set Jo contain functions of the form 7: O — ©’; it is expected that 7(Z) =Z 
for every such function. In addition, we impose that Io always contains two 
other functions defined as 1: x > x, and 0: x > ØA. We shall override the term 
“part” and say that an object o' € O’ is a part of some oth er object o € O if it 
is the result of applying some r € Ho to o. A proper part is any part other than 
1 and 0; we shall use [/% to designate the set of proper parts for a given type. 
A type O is called scalar if Hë = 0; otherwise it is called composite. 


E O i e 
Ty LARO) T Foi 

T3 7) @ | T3 Tc 
Tg (4) Tg © 
JOJO T| )| % 


Fig. 2. Illustration of two abstract composite objects of the same type. Composite parts 
are represented by gray rectangles; scalar parts are represented by colored rectangles. 
(Color figure online) 


Figure 2 shows an example of two abstract composite objects X and Y of the 
same type O, which we suppose has a set of three proper parts called 74, 7p and 
mc. Parts of an object can themselves be of a composite type; we assume here 
that type A has four parts 71,...,74, type B has two parts 75,76, and type C 
is a scalar. For example, 7g(X) corresponds to the rectangle numbered 4, and 
t™c(Y) is the rectangle numbered 7. Consistent with our definitions, 1 designates 
the whole object, hence 1(X) is the rectangle 1. These parts are not all present 
in all objects of a given type; for example we see that mco(X) = Ø. 

Parts can be composed in the usual way: if 7: O > O’ and 7’ : O' = O", 
their composition 7 o 7’ is defined as o + a(2’(o)). This corresponds intuitively 
to the notion of the part of some part of an object, and will allow us to point 
to arbitrarily fine-grained portions of input arguments or output values. For 
example, in Fig. 2, we have that (mı 07,)(X) corresponds to the rectangle num- 
bered 8, which indeed corresponds to part mı of the part ma of object X. If 
IT = {m71,...,%n} is a set of parts and 7 is another part, we will abuse notation 
and write JI o m to mean the set {71 07,...,7 OT}. 
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If 7 is a part of an object, we shall say that 7’ om is a refinement of 7; 
inversely, m is a generalization of 7’ o 7, as it corresponds to a “greater” part of 
the object. Remark that since (loz) = (701) = 7, any part 7 is simultaneously 
a refinement and a generalization of itself. If m is a proper part of o, all its 
refinements are also considered as parts of o. The notions of refinement and 
generalization can be extended to sets of parts. Given two sets IT = {71,..., nm} 
and I’ = {n},...,7),}, H is a refinement if there exists an injection u between 
IT and I’ such that u(r) = x’ if and only if 7 is a refinement of x’; conversely, 
IT’ is a generalization of I. Again, a refinement of a set of parts I picks fewer 
parts, and more selective parts of an object than J. In the example of Fig. 2, the 
set IT = {71 o TA, TO} is a refinement of I’ = {74,75 °7B,7c} (the injection 
here being the two associations 71 074 > 74 and Tte +> To). 

Given a part 7, two objects 0,0’ € O are said to differ on m if n(o) 4 m(0’). 
This can be illustrated in Fig. 2. Objects X and Y differ on mc, since t¢o(X) F 
tc(Y). They also differ on m4 (since rectangles 3 and 5 are different) and on 
mt40 Ta (which produces rectangle 11 for X and J for Y). However, they do not 
differ on 7 (rectangles 4 and 6 are identical), and they do not differ on 13 07,4 
(rectangles 10 and 16 are identical). This example shows that if two objects differ 
on a part 7, they do not necessarily differ on a refinement of m (compare 74 and 
Tı ona). Given a set of parts IT = {7,...,7}, two objects differ on IT if they 
differ in some m; € IT. Obviously, if objects differ on IT, they also differ on any 
of its generalizations. 

Finally, two parts m and a’ intersect if there exist two parts my and 7; 
(different from 0) such that whenever (rz 07)(0) 4 Z and (mri 0o7’)(o) 4 B, then 
(710 7)(0) = (a, 07')(0). Two sets of parts I and I’ intersect if at least one of 
their respective parts intersect. In Fig. 2, the sets {74} and {7207 ,4, 7c} intersect 
(they have in common the part 72 0 ma), while the sets {74} and {75 o 7p} do 
not. 


3.2 A Definition of Explanation 


We are now interested in relations between a part of a function’s output, and 
one or more parts of that function’s input. We shall focus on the set of unary 
functions f : O — O’. For the sake of simplicity, functions of multiple input 
arguments will be modeled as unary functions whose input is a composite type 
that contains the arguments. These composite types will be used informally to 
illustrate the notions, and will be formally defined in Sect. 4. 


Definition 1. Let f : O — O' be a function, H C Ilo be a set of parts of 
the function’s input type, and a € Ho, be a part of the function’s output type. 
Consider a set JI such that there exists an object o’ that differs from o only on 
IT, and for which f(o) and f(o’) differ on 7. We say that IT explains m if I is 
minimal, meaning that no refinement of JT satisfies the previous condition. 


As an example, consider the function f : (x,y) > «xy, with « = 1, y = 1. 
For this particular input object, the part designating the first element of the 
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input is an explanation, as changing it to any other value changes the result of 
the function; the same argument can be made for the part that designates the 
second element of the input. Consider now the case where x = 0, y = 1. This 
time, the value of the second element is irrelevant: changing it to anything else 
still produces 0. Therefore, the second element of the input is not an explanation 
for the output; the first element of the input alone “explains” the result of 0 in 
the output. 

Based on these simple definitions, we can already put our framework in con- 
trast with some of the papers we surveyed in Sect. 2. First, note how this defini- 
tion is different from counterfactual causality [1,28]. This can be illustrated by 
considering the previous function, in the case where x = 0 and y = 0. Argument 
x is not a counterfactual cause of the output value, as changing it to anything 
else still produces 0; the same argument shows that y is not a cause of the output 
either. Therefore, one ends up with the counter-intuitive conclusion that none 
of the inputs cause the output. 

In contrast, there exists a minimal set of parts that satisfies our explanation 
property, which is the set that contains both the first and the second element. 
Indeed, there exists another input object that differs on both elements, and 
which produces a different result. This is in line with the intuition that either 
element is sufficient to explain the null value produced by the function, and 
that therefore both need to be changed to have an impact. It highlights a first 
distinguishing feature of our approach: the presence of multiple parts inside a 
set indicates a form of “disjunction” or “alternate” explanations, something that 
cannot be easily accounted for in many definitions of causality.' 

Why-provenance is expressed on tuples manipulated by a relational query 
[17], but our simple case can easily be adapted by assuming that x, y and f((2, y)) 
each are tuples with a dummy attribute a. The definition then leads to the 
conclusion that both x and y are considered to “produce” the output, whereas 
explanation rather concludes that either explains the output; the use of how- 
provenance [21] would produce a similar verdict. Where-provenance [8] is even 
less appropriate here, as it makes little sense to ask whether the product of two 
numbers “copies” any of the input arguments to its output. 

Since a single explanation is a set of parts, the set of all explanations is 
therefore a set of sets of parts. As we have shown, a set of parts intuitively 
represents an alternative (either part is an explanation). In turn, elements of 
a set of set of parts represents the fact that each of them is an explanation. 
Therefore, a concise graphical notation for representing sets of sets of parts are 
and-or trees, such as the one shown in Fig. 3. In the present case, leaves of the 
tree each represent a (single) object part, while non-leaf nodes are labeled either 
with “and” (A) or “or” (V). For example, this tree represents the set of sets 
{{m1}, {2 073, 15 0 16, Ta}, {T2 0 13, 17 O 16, Tat fe? 


1 Case in point, in [1] a cause is assumed to be a conjunction of assertions of the form 
X=. 
? Obviously, there exist multiple equivalent trees for the same set of sets of parts. 
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4 Building Explanations for Functions 


Equipped with these abstract definitions, we shall now establish properties of a 
handful of elementary functions, namely logical and arithmetic operations, and 
list manipulations. The reader may find that this section is stating the obvious, 
as many of the results we present correspond to very intuitive notions. The main 
interest of our approach is that these seemingly trivial conclusions are not defined 
ad hoc, but rather come as consequences of the general definition of explanation 
introduced in the previous section. 


Fig. 3. An example of and-or tree where leaves are object parts. 


We first consider as scalar types the sets of Boolean values B, the set of real 
numbers R, and the set of characters S. Then, we shall denote by V(Q1,..., On) 
the set of vectors of size n, where the i-th element is of type O. Its set of parts 
ITy(0,,...,0,) Contains 1 and 0, as well as all functions [i] : V(Q1,...,On) > Oi 
defined as: 


fae i oi ifl<i<n 
i] : (01, ..., On) => ; 

! ” Z otherwise. 
In other words, the proper parts of a vector are each of its elements. We shall 
designate for finite vectors of variable length and uniform type O by V(O*). 
Finally, character strings will be viewed as the type V(S*), i.e. finite words over 
the alphabet of symbols S. We stress that, although our concept of explanation 
is illustrated on a small set of functions operating on these types, it is by no 
means limited to these functions or these types. 


4.1 Conservative Generalizations 


Some of the functions we shall consider return objects that may be compos- 
ite; these functions introduce the additional complexity that one may want to 
refer not only to the whole output of the function, but also to a single part of 
that function’s output. What is more, the inputs of these functions can also be 
composite objects, and explicitly enumerating all their minimal sets of parts for 
explanation may not be possible. 

Take for example the function f : V(O)—V(O), which simply returns its 
input vector as is. Suppose that we focus on m = [1], the first element of the 
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output vector. Clearly, the set {[1]}, pointing to the first element of the input 
vector, should be recognized as the only one that explains this output. However, 
if O is a composite type, this set is not minimal, and should be further broken 
down into all the parts of O. Besides being unmanageable, this enumeration also 
misses the intuition that what explains the first element of the output is simply 
the first element of the input. 

In the following, we employ an alternate approach, which will be to define a 
set of conservative generalizations of the function’s input parts. For most func- 
tions, the principle will be the same: given an output part m, we shall define a 
set of sets of input parts H = {Ih,..., In}, and demonstrate that any input 
part I’ that explains 7 on some input intersects with one of the M. It follows 
that any minimal input part that explains the output is a refinement of one of 
the II. 


Fig. 4. The sets of parts Mı, Hə and IHs (in green) represent a conservative general- 
ization of the minimal sets of explanation parts (yellow circles) (Color figure online). 


This is illustrated in Fig. 4, where the minimal sets of explanations of an 
input are illustrated in yellow. Here, the set {I , Ms, M3} has been identified as 
the target set of sets of input parts. It is possible to see that if one establishes 
that any set lying outside of the green ovals is not an explanation, it follows that 
the minimal sets of parts for explanation are all contained inside one of M, Mə 
and I3. Note that this is a generalization, as, for example, Ia does not contain 
any minimal set. However, this generalization is conservative, in the sense that 
all minimal sets are indeed contained within a green oval. 

The goal is therefore to come up with generalizations that are, in a sense, as 
tight as possible. In the example of function f above, we could easily demonstrate 
that any set of input parts J that has an impact on the first element of the output 
must contain a refinement of the first element of the input, and therefore identify 
IT = {{[1]}} as a sufficient set that “covers” all the minimal explanation input 
parts. It so happens that this set corresponds exactly to the intuitive result we 
expected in the first place: the first element of the input vector contains all the 
parts that impact the output, and no other part of the input has this property. 
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4.2 Explanation for Scalar Functions 


In the following, we provide the formal definition of conservative generalizations 
of the explanation relationship for a number of elementary functions. We start 
with functions performing basic arithmetic operations returning a scalar value. 
Establishing explanation for addition over a vector of numbers is trivial. 


Theorem 1. Let f : V(R*) — R be the function defined as (z1,..., 2n) > 
vy +: + £n. For any input (z1,..., 2n), MH explains 1 on (z1,..., £n) if and 
only if J = {[i]} for some 1 <i<n. 


In other words, any single element of the input vector explains the result; the 
case of subtraction is defined identically. Multiplication, however, has a different 
definition. This is caused by the fact that 0 is an absorbing element, hence its 
presence suffices for a product to yield zero, as is explained by the following 
theorem. 


Theorem 2. Let f : R” — R be the function defined as (x1,..., En) > 1 -+++ En- 
For a given input (z1,..., £n}, I explains 1 if and only if: 


— IT = {[|i]} for some 1 <i < n, if for alll < j < n, zj #0 
- H = Viegia <j reece. =o} {lä} otherwise. 


Proof. Suppose that all elements of the input vector are non-null. Let H = {[i]} 
for some 1 <i < n. Clearly, any input that differs from (x,,...,2,,) only in the 
i-th element produces a different product, and hence is a minimal explanation. 
Suppose now that at least one element of the vector is null; in such a case, the 
function returns 0. Let S = {j : 1 < j < n and xj =0} be the set of all such 
vector indices. A vector must differ from the input in at least all these positions 
in order to produce a different output, otherwise the function still returns zero. 
The only refinements of this set are its strict subsets, but none is sufficient to 
change the output, and hence the defined set is the only minimal set satisfying 
the explanation property. 


The same argument can be made of the Boolean function that computes the 
conjunction of a vector of Boolean values. If all elements of the vector are T, 
changing any of them produces a change of value, and hence each {[i]} explains 
the output. Otherwise, the set IT = Ujegja <j < n and z; = 1} {l} is the only 
minimal set of input parts that explains the output, by a reasoning similar to 
the case of multiplication above. A dual argument can be done for disjunction 
by swapping the roles of T and L. 

The case of the remaining usual arithmetic and Boolean functions can be 
dispatched easily. Functions taking a single argument (abs, etc.) obviously have 
this single argument as their only minimal explanation part. 

We finally turn to the case of a function that extracts the k-th element of a 
vector. We recall that vectors can be nested, and hence this element may itself 
be composite. The intuition here is that what explains a part of the output is 
that same part in the k-th element of the input. 
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Theorem 3. Let fp : V(O”) — O be the function defined as (x1,..., £n) = Tk 
if 1 < k <n, and (z1,..., £n) > BØ otherwise. Let r € Io be an arbitrary part 
of O. For any input (21,...,2,), MH explains m for (£1,..., £n) if and only if 
IH = {r0 [|k] } andl <k<n. 


Proof. If k < 0 or k > n, fk produces Z regardless of its argument, and so 
no set of input explains the result. Otherwise, it suffices to observe that (m o 
[t])((a1,..-,%n)) = T(z:) = T(fk((£1,..-,£n)})), and hence {7 o [i] } explains the 
output. No other set satisfies this condition. 


Table 1. Definition of elementary vector functions studied in this paper. 


= (o;) ifl1<i<n 
aa A otherwise 
arp (01, ---50n)) = (F01), -+ Flon) 
wk? ((01,.--,0n)) = (J ( (Oes 0k) ein] (Onkyon) 
i((b,0,0')) = | o ifb=T 


o' otherwise 


4.3 Explanation for Vector Functions 


We shall delve into more detail on basic functions that produce a value that 
may be of non-scalar type, summarized in Table 1 (function [i] has already been 
discussed earlier). Here, we must consider the fact that an explanation may refer 
to a part of an element of their output, i.e. designators of the form 7 o [i], with 
m some arbitrary designator. 

The first function, noted ap, applies a function f on each element of an input 
vector, resulting in an output vector of same cardinality. 


Theorem 4. On a given input (z£1,..., 8n), H explains 70 [i] of ap if and only 
if I = {I o [i]} for some set of parts I’, and I’ explains 7 for x; of f. 


Proof. The i-th element of the output of ay is f(x;); if II’ explains x; of f, then 
IT’ o [i] explains (21,...,2n) of ay. Conversely, suppose that I’ o [i] explains 
(£1,-.-,%n) Of ay; by definition, there exists another input (zx},..., xh) that 
differs on IZ’ o [i] and such that the output of ap differs on 7 o [i]. Then x; and 
x; differ on I’, and by the definition of ay, f(x;) and f(x‘) differ on m. If I” 
admits a proper part that satisfies this property, then J is not an explanation, 
which contradicts the hypothesis. Hence JZ’ is minimal, and it therefore explains 
T for x; of f. 


Function wi’? applies a function f on a sliding window of width k to the input 
vector. That is, the first element of the output vector is the result of evaluating 
f on the first k elements; the second element is the evaluation of f on elements 
at positions 2 to k + 1, and so on. If the input vector has fewer than k elements, 
the function is defined to return a predefined value o. To establish the set of 
minimal explanations, we define a special function o;; given a set of parts IT 
such that all parts are of the form 7 o [j], replaces each of them by ~ro [j — i]. In 
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other words, parts pointing to a part 7 of the j-th element of a vector end up 
pointing to the same part m of the (j — i)-th element of a vector. 


Theorem 5. Ona given input (zx1,..., £n), M explains 70 fi] of wy? if and only 
ifn > k, i > n-k, H = {I o [i]} for some set of parts 1’. and o;(II’) explains 
T for x; of f. 


Proof. The proof is almost identical as for ay, with the added twist that in the 
i-th window, an explanation for f referring to a part of the j-th element of its 
input vector actually refers to the (j — i)-th element of the input vector given to 
wy; this explains the presence of o;. We omit the details. 
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Finally, function “j” acts as a form of if-then-else construct: depending on 
the value of its (Boolean) first argument, it returns either its second or its third 
argument (which can be arbitrary objects). Defining its explanation requires a 
few cases, depending on whether the second and third element of the input are 
equal. 


Theorem 6. Ona given input (21, £2, £3), if £2 Æ x3, then {[1]} always explains 
T of į; moreover {7 o [2]} explains a of j if xı = T, and {7 o [3]} explains a of į, 
if xı = L. However, if x2 = x3, then: 1. if xı = T, I explains 7 of į if and only 
if IH = {r o [2]} or H = {[1], 70 [3]}; 2. if xı = L, H explains z of į if and only 
if I = {ro [3]} or H = {[1], 7 o [2]. 


Proof. Direct from the fact that ¿((T,£2,£3}) = x2, and j((L,22,%3)) = £3. 
The only corner case is when x2 = x3; if x; = T, one must change either x2, or 
both xı and z3 in order to produce a different result (and dually when x, = L). 


4.4 Explanation for Composed Functions 


Defining and proving conservative generalizations is a task that can quickly 
become tedious for complex functions, as the previous examples have shown 
us. Moreover, this process must be done from scratch for each new function one 
wishes to consider, as the proofs for each of them are quite different. In this 
section, we consider the situation where a complex function f is built through 
the composition of simpler functions. 

We first demonstrate a recipe for building conservative generalizations for 
compositions of functions. In such a case, it is possible to derive a conserva- 
tive generalization for f by chaining and combining the generalizations already 
obtained for the simpler functions it is made of. To ease notation, we shall write 
IT |, m to indicate that IT is a conservative generalization of all minimal input 
parts that explain m for input o. We extend the notion of generalization to sets 
of output parts; for a set of parts I’, we have that IT lHo I’ if IT |, n’ for every 
n’ € IT’. We first trivially observe the following: 


Theorem 7. For a given function f : O — O’ and a given input o € O, if 
IT, IF Ti and IT IRs T2, then IT, U IT les {11,72}. 
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Thus, given a set of output parts I’, a conservative generalization can be 
obtained by taking the union of the generalizations for each individual part in 
IT’. We can then establish a result for the composition of two functions. 


Theorem 8. Let m be an output part of some function f, and a given input 
o € O. Let Ip be a set of sets of parts such that I, I, m for function f. Let 
IT, be a set of sets of parts such that Ig lFf(o) Hp for some function g. Then 
IT, |, 7 for function f o g. 


Proof. Suppose that there is a set of parts IJ that does not intersect with Iq, 
and such that for two inputs o and o’ that differ on ng, n( f og)(0) A a(fog)(o). 
Let x = g(o) and y = g(o’); since z(f(x)) 4 m(f(y)), then x and y differ on a 
set of parts IZ’. By definition, a refinement of IZ’, called IT”, is also a refinement 
of Ip. Since I, Ir fio) Hy, this implies that o and o’ differ on a part that 
intersects with J4, which contradicts the hypothesis. It follows that all sets of 
parts of f o g that explain o for m intersect with I,, and hence Hg IF f(o) m for 
function f og. 


Thanks to this result, one can obtain a conservative generalization for f o g 
by first finding a conservative generalization IT of m for g, and then finding 
a conservative generalization of I’ of I for f. This spares us from defining 
an input-output explanation relation for each possible function, at the price of 
obtaining a conservative approximation of the actual relation. When express- 
ing these explanations as and-or trees, this simply amounts to appending the 
root of an explanation (the output of a function) to the leaf designating the 
corresponding input of the function it is composed with. 

We recall that one of the claimed features of our proposed approach was 
tractability. The theorems stated throughout this section give credence to this 
claim. One can see that, for each of the elementary functions we studied in 
Sects. 4.2 and 4.3, determining the sets of input parts that are (conservative) 
explanations of an output part can be done by applying simple rules that require 
no particular calculation. Then, building an explanation for a composed function 
is not much harder, and requires properly matching the output parts of a function 
to the input parts of the one it calls. 


5 Implementation and Experiments 


Combined, the previous results make it possible to systematically construct 
explanations for a wide range of computations. It suffices to observe that nested 
lists-of-lists, coupled with the functions defined in Sect. 4, represent a significant 
fragment of a functional programming language such as Lisp. 

To illustrate this point, the concepts introduced above have been concretely 
implemented into Petit Poucet?, an open source Java library.* The library allows 


3 In English Hop-o’-My-Thumb, a fairy tale by French writer Charles Perrault where 
the main character uses stones to mark a trail that enables him to successfully lead 
his lost brothers back home. 

4 https://github.com/liflab/petitpoucet. Version 1.0 is considered in this paper. 
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users to create complex functions by composing the elementary functions studied 
earlier, to evaluate these functions on inputs, and to generate the corresponding 
explanation graphs. This library is meant as a proof of concept that serves two 
goals:1. show the feasibility of our proposed theoretical framework and provide 
initial results on its running time and memory consumption; 2. provide a test 
bench allowing us to study the explanation graphs of various functions for various 
inputs. 


5.1 Library Overview 


Petit Poucet provides a set of ready-made Function objects; in its current imple- 
mentation, it contains all the functions defined in Sect. 4, in addition to a few 
others for number comparison, type conversion, descriptive statistics (i.e. aver- 
age), basic I/O (reading and writing to files) and string and list manipulation. 
Composed functions are created by adding elementary functions into an object 
called CircuitFunction, and manually connecting the output of each function 
instance to the input argument of another. 

Figure 5 shows a graphical representation of a complex function that can be 
created by instantiating and composing elementary functions of the library. Each 
white box represents an elementary function; composition is illustrated by lines 
connecting the output (dark square) of a function to the input (light square) of 
another. For functions taking other functions as parameters, namely af and wy, 
the parameterized function is represented by a rectangle attached with a dotted 
line (such as box A attached to box 2). The composed function shown here is 
exactly the one from our example in the introduction: from a CSV file (box 1), 
the second element of each line is extracted and cast to a number (boxes 2 and 
A), the average over a sliding window is taken (boxes 3 and B), each value is 
checked to be greater than 3 (boxes 4 and C), and the logical conjunction of all 
these values is taken (box 5). 


I, raze 


|n} fat up a gat 


W 
E rH 
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Fig. 5. Evaluating a condition on the average of values over a sliding window. 


Once created, a function can be evaluated with input arguments. When this 
happens, it returns a special object called a Queryable. The purpose of the 
queryable is to retain the information about the function’s evaluation necessary 
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to answer “queries” about it at a later time. Each evaluation of the function 
produces a distinct queryable object with its own memory. Given a designator 
pointing to a part of the function’s output, calling a Queryable’s method query 
produces the corresponding explanation and-or tree. Typically, one is interested 
in a simplified rendition displaying only the root, leaves and Boolean nodes, 
hiding the intermediate nodes made of the input/output of the intermediate 
functions in the explanation. On the CSV file shown in the introduction, the 
library produces the tree that is shown in Fig. 1.° 

One can see that the false result produced by the function admits three alter- 
nate explanations (the three sub-trees under the “or” node). The first explana- 
tion involves the numerical values in lines 3-4-5 (children of the first “and” node), 
the second includes lines 4-5-6, and the third explanation is made of the num- 
bers in lines 5-6-7. This corresponds exactly to the three windows of successive 
value whose average is not greater than 3. Indeed, the presence of either of these 
three windows, and nothing else, suffices for our global condition to evaluate 
to false. Note how the explanation generation mechanism correctly and auto- 
matically identifies these, and also how, thanks to our concept of designator, the 
explanation can refer to specific locations inside specific lines of the input object. 

In Petit Poucet, lineage capabilities are built-in. The user is not required to 
perform any special task in order to keep track of provenance information. In 
addition, it shall be noted that one does not need to declare in advance what 
designation graph will be asked for. The construction of the Queryable objects is 
the same, regardless of the output part being used as the starting point. Finally, 
the library follows a modular architecture where the set of available functions can 
easily be extended by creating packages defining new descendants of Function. 
It suffices for each function to produce a Queryable object that computes its 
specific input-output relationships; an explanation can then be computed for any 
composed function that uses it. 


5.2 Experiments 


To test the performance of the library, we selected various data processing tasks 
and implemented them as composed functions in Petit Poucet. 


Get All Numbers. Represents a simple operation that takes an input comma- 
separated list of elements, and produces a vector containing only the elements of 
the file that are numerical values. The explanation we ask is to point at a given 
element of the output vector, and retrace the location in the input string that 
corresponds to this number. 


Sliding Window Average. Given a CSV file, this task extracts the numerical 
value in each line, compute the average of each set of n successive values and 
check that it is below some threshold t (similar to the example we discussed 


5 Or more precisely a directed acyclic graph, since leaf nodes with the same designator 
are not repeated. 
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earlier). Computing an aggregation over a sliding window is a common task in 
the field of event stream processing [26] and runtime verification [4], and is also 
provided by most statistical software, such as R’s smooth package. It can be 
seen as a basic form of trend deviation detection [41], where the end result of 
the calculation is an “alarm” indicating that the expected trend has not been 
followed across the whole data file; a classical example of this is the detection 
of temperature peaks in a server rack [40]. The explanation we ask is to point 
at the output Boolean value (true or false), and retrace the locations in the file 
corresponding to the numbers explaining the result. 


Triangle Areas. Given a list of arbitrary vectors, this task checks that each vector 
contains the lengths of the three sides of a valid triangle; if so, it computes their 
area using Heron’s formula® and sums the area of all valid triangles. It was chosen 
because it involves multiple if-then-else cases to verify the sides of a triangle. It 
also involves a slightly more involved arithmetical calculation to get the area, 
which is completely implemented by composing basic arithmetic operators in 
a composed function. The whole function is the composition of 59 elementary 
functions. 

This example is notable, because the explanations it generates may take 
different forms. An explanation for the output value (total area) includes an 
explanation for each vector: if it represents a valid triangle, it refers to its three 
sides; if it does not, the condition can fail for different reasons: the vector may 
not have three elements, or have one of its elements that is not a number, or 
contain a negative value, or violate the triangle inequality. Each condition, when 
violated, produces different explanations pointing at different elements of the 
vector, or the vector as a whole. Moreover, each vector in the list may not be 
a valid triangle for different reasons, and hence a different explanation will be 
built for each of them. For example, given the list [(a, 4, 2), (3, 5,6), (2,3)], a tree 
will be produced that describes an explanation involving three elements: the first 
points at the element a of the first vector (not a number), the second points at 
all three components of the second vector (valid triangle), and the last points at 
the whole third vector (wrong number of elements). 


Nested Bounding Boxes. Given a DOM tree”, this task checks that each ele- 
ment has a bounding box (width and height) larger than all of its children. This 
condition is the symptom of a layout bug which shows visually as an element pro- 
truding from its parent box inside the web browser’s window. This corresponds 
to one of the properties that is evaluated by web testing tools such as Cornipickle 
[24] and ReDeCheck [49] on real web pages. In this task, trees are represented 
as nested lists-of-lists, with each DOM node corresponding to a triplet made of 
its width, height, and a list of its children nodes. The explanation we ask is to 
point at the Boolean output of the condition, and retrace the nodes of the tree 
that violate it. 


ê A= \/s(s — a)(s — b)(s — c), where s = Hte, 
T A DOM tree represents the structure of elements in an HTML document [2]. 
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In all these tasks, the inputs given to the function are randomly generated 
structures of the corresponding type. The experiments were implemented using 
the LabPal testing framework [25], which makes it possible to bundle all the nec- 
essary code, libraries and input data within a single self-contained executable file, 
such that anyone can download and independently reproduce the experiments. 
A downloadable lab instance containing all the experiments of this paper can be 
obtained online [27]. Overall, our empirical measurements involve 56 individual 
experiments, which together generated 224 distinct data points. All the exper- 
iments were run on a AMD Athlon II X4 640 1.8 GHz running Ubuntu 18.04, 
inside a Java 8 virtual machine with 3566 MB of memory. 


Memory Consumption. The first experiment aims to measure the amount 
of memory used up by the Queryable objects generated by the evaluation of a 
function, and the impact of the size of the input on global memory consump- 
tion. To this end, we ran various functions on inputs of different size; for each, 
we measured the amount of memory consumed, with explainability successively 
turned on and off. This is possible thanks to a switch provided by Petit Poucet, 
and which allows users to completely disable tracking if desired. 
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Fig. 6. Impact of explainability on memory consumption. 


Figure6 shows a plot that compares the amount of memory consumed by 
Function objects. Each point in the plot corresponds to a pair of experiments: 
the x coordinate corresponds to the memory consumed by a function without 
explainability, and the y coordinate corresponds to the memory consumed by 
the same function on the same input, but with explainability enabled. All the 
points for the same task have been grouped into a category and are given the 
same color. 

Analyzing this plot brings both bad news and good news. The “bad” news is 
that the additional memory required for explainability is high when expressed in 
relative terms. For example, a composed that requires 498 KB to be evaluated 
on an input requires close to 15 MB once explainability is enabled. The “good” 
news is twofold. First, this consumption is still reasonable in the absolute: at this 
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rate, it takes an input file of 42 million lines before filling up the available RAM 
in a 64 GB machine. Second, and most importantly, the relationship between 
memory consumption with and without lineage is linear: for all the tasks we 
tested, if m is the memory used without lineage, then the memory m’ used when 
lineage is enabled is in O(m), i.e. the ratio m’/m does not depend on the size of 
the input. 

These figures should be put in context by comparing the overhead incurred by 
other systems mentioned in Sect. 2. Related systems for provenance in databases 
(namely Polygen [51], MONDRIAN [20], MXQL [48], DBNotes [11] pSQL [7] and 
ORCHESTRA [29]) do not divulge their storage overhead for provenance data. 
A recent technical report on a provenance-aware database management system 
measures an overhead ranging between 19% and 702% [3]. Dynamic taint propa- 
gation systems report a memory overhead reaching 4x for TaintCheck [37], 240x 
[14] for Dytan, and “an enormity” of logging information for RIFLE ([13], authors’ 
quote). Although these systems operate at a different level of abstraction, this 
shows that explainability is inherently costly regardless of the approach chosen. 


Computation Time. We performed the same experiments, but this time mea- 
suring computation time. The results are shown in Fig. 7; similar to memory con- 
sumption, they compare the running time of the same function on the same input, 
both with and without explanation tracking. The largest slowdown observed 
across all instances is 6.7x. For a task like Sliding Window Average, the aver- 
age slowdown observed is 1.93x across all inputs. Although this slowdown is 
non-negligible, it is reasonable nevertheless, adding at most a few hundreds of 
milliseconds on the problems considered in our benchmark. 
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Fig. 7. Impact of explainability on computation time. 


Again, these results should be put in context with respect to existing works 
that include a form of lineage. The MONDRIAN system reports an average slow- 
down of 3x; pSQL ranges between 10x and 1,000x; the remaining tools do not 
report CPU overhead. For taint analysis tools, Dytan reports a 30-50 slow- 
down; GIFT-compiled programs are slowed down by up to 12x; TaintCheck has 
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a slowdown of around 20x, 1-2x for RIFLE and around 20x for TaintCheck. Of 
course, these various systems compute different types of lineage information, but 
these figures give an outlook of the order of magnitude one should expect from 
such systems. 


6 Conclusion 


This paper provided the formal foundations for a generic and granular explain- 
ability framework. An important highlight of this model is its capability to han- 
dle abstract composite data structures, including character strings or lists of ele- 
ments. The paper then defined the notion of designator, which are functions that 
can point to and extract parts of these data structures. An explainability rela- 
tionship on functions has been formally defined, and conservative approximations 
of this relation have been proved for a set of elementary functions. A point in 
favor of this approach is that explanations of composed functions can be built by 
composing the explanations for elementary functions. Combined, these concepts 
make it possible to automatically extract the explanation of a result for generic 
functions at a fine level of granularity. These concepts have been implemented 
into a proof-of-concept, yet fully functional library called Petit Poucet, and eval- 
uated experimentally on a number of data processing tasks. These experiments 
revealed that the amount of memory required to track explainability metadata 
is relatively high, but more importantly, showed that it is linear in the size of 
the memory required to evaluate the function in the first place. 

Obviously, Petit Poucet is not intended to replace programs written using 
other languages and following different paradigms. However, it could be used as 
a library by other tools that could benefit from its explanation features. In par- 
ticular, testing libraries such as JUnit could be extended by assertions written 
as Petit Poucet functions, and provide a detailed explanation of a test failure 
without requiring extra code. Explainability functionalities could also easily be 
retrofitted into existing (Java) software, with minimal interference on their cur- 
rent code. Case in point, we already identified the Cornipickle web testing tool 
[24] and the BeepBeep event stream processing engine [26] as some of the first 
targets for the addition of explainability based on Petit Poucet. A lineage-aware 
version of the GRAL plotting library® is also considered. 

The existence of a definition of fine-grained explainability opens the way to 
multiple exciting theoretical questions. For example: for a given function, is there 
a part of the input that is present in all explanations? We can see an example 
of this in Fig. 1, with the leaf pointing to value —80. Intuitively, this tends to 
indicate that some parts of an input have a greater “responsibility” than others 
in the result, and could provide an alternate way of quantifying this notion than 
what has been studied so far [12]. On the contrary, is there a part of the input 
that never explains the production of the output, regardless of the input? This 
latter question could shed a different light on an existing notion called vacuity 
[6], expressed not in terms of elements of the specification, but on the parts of 


8 https: //github.com/eseifert /gral. 
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the input it is evaluated on. More generally, explainability can be viewed as a 
particular form of static analysis for functions; it would therefore be interesting 
to recast our model in the abstract interpretation framework [35,39] in order to 
further assess its strengths and weaknesses. 

Finally, explanations could also prove useful from a testing and verification 
standpoint. The explanation graph could be used for log trace and bug triaging 
[30]: if two execution traces violate the same condition, one could keep one trace 
instance for each distinct explanation they induce, as representatives of traces 
that fail for different reasons. This could help reduce the amount of log data that 
needs to be preserved, by keeping only one log instance of each type of failure. 


References 


1. Aleksandrowicz, G., Chockler, H., Halpern, J.Y., Ivrii, A.: The computational com- 
plexity of structure-based causality. J. Artif. Intell. Res. 58, 431-451 (2017) 

2. Apparao, V., et al.: Document object model (DOM) level 1 specification. Tech- 
nical report, World Wide Web Consortium (1998). https://www.w3.org/DOM/. 
Accessed 17 Nov 2019 

3. Arab, B., Gawlick, D., Krishnaswamy, V., Radhakrishnan, V., Glavic, B.: Formal 
foundations of reenactment and transaction provenance. Technical Report IIT/CS- 
DB-2016-01, Illinois Institute of Technology (2016) 

4. Basin, D.A., Klaedtke, F., Marinovic, S., Zalinescu, E.: Monitoring of temporal 
first-order properties with aggregations. Formal Methods Syst. Des. 46(3), 262- 
285 (2015) 

5. Beer, I., Ben-David, S., Chockler, H., Orni, A., Trefler, R.J.: Explaining counterex- 
amples using causality. Formal Methods Syst. Des. 40(1), 20-40 (2012) 

6. Ben-David, S., Copty, F., Fisman, D., Ruah, S.: Vacuity in practice: temporal 
antecedent failure. Formal Methods Syst. Des. 46(1), 81-104 (2015). https://doi. 
org/10.1007/s10703-014-0221-0 

7. Bhagwat, D., Chiticariu, L., Tan, W.C., Vijayvargiya, G.: An annotation manage- 
ment system for relational databases. VLDB J. 14(4), 373-396 (2005). https: //doi. 
org/10.1007/s00778-005-0156-6 

8. Buneman, P., Khanna, S., Wang-Chiew, T.: Why and where: a characterization 
of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, 
vol. 1973, pp. 316-330. Springer, Heidelberg (2001). https: //doi.org/10.1007/3-540- 
44503-X_20 

9. Cheney, J., Chiticariu, L., Tan, W.-C.: Provenance in databases: why, how, and 
where. Found. Trends Databases 1(4), 379-474 (2007) 

10. Chiticariu, L., Tan, W.C.: Debugging schema mappings with routes. In: Dayal, U., 
et al. (eds.) Proceedings of the VLDB 2006, pp. 79-90. ACM (2006) 

11. Chiticariu, L., Tan, W.C., Vijayvargiya, G.: DBNotes: a post-it system for rela- 
tional databases based on provenance. In: Ozcan, F. (ed.) Proceedings of the SIG- 
MOD 2005, pp. 942-944. ACM (2005) 

12. Chockler, H., Halpern, J.Y.: Responsibility and blame: A structural-model app- 
roach. J. Artif. Intell. Res. 22, 93-115 (2004) 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27: 


28. 


29. 


30. 


3l. 


32. 


Foundations of Fine-Grained Explainability 521 


Chow, J., Pfaff, B., Garfinkel, T., Christopher, K., Rosenblum, M.: Understanding 
data lifetime via whole system simulation. In: Blaze, M. (ed.) Proceedings of the 
USENIX Security 2004, pp. 321-336. USENIX (2004) 

Clause, J.A., Li, W., Orso, A.: Dytan: a generic dynamic taint analysis framework. 
In: Rosenblum, D.S., Elbaum, S.G. (eds.) Proceedings of the ISSTA 2007, pp. 196- 
206. ACM (2007) 

Cleve, H., Zeller, A.: Locating causes of program failures. In: Roman, G., Griswold, 
W.G., Nuseibeh, B. (eds.) Proceedings of the ICSE 2005, pp. 342-351. ACM (2005) 
Crandall, J.R., Chong, F.T.: Minos: control data attack prevention orthogonal to 
memory model. In: Proceedings of the MICRO-37, pp. 221-232. IEEE Computer 
Society (2004) 

Cui, Y., Widom, J., Wiener, J.L.: Tracing the lineage of view data in a warehousing 
environment. ACM Trans. Database Syst. 25(2), 179-227 (2000) 

Eiter, T., Lukasiewicz, T.: Complexity results for structure-based causality. Artif. 
Intell. 142(1), 53-89 (2002) 

Ferrère, T., Maler, O., Ničković, D.: Trace diagnostics using temporal implicants. 
In: Finkbeiner, B., Pu, G., Zhang, L. (eds.) ATVA 2015. LNCS, vol. 9364, pp. 
241-258. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24953-7_20 
Geerts, F., Kementsietsidis, A., Milano, D.: MONDRIAN: annotating and querying 
databases through colors and blocks. In: Liu, L., Reuter, A., Whang, K., Zhang, J. 
(eds.) Proceedings of the ICDE 2006, pp. 82. IEEE Computer Society (2006) 
Green, T.J., Karvounarakis, G., Tannen. Provenance semirings. In: Libkin, L. (ed.) 
Proceedings of the PODS 2007, pp. 31-40. ACM (2007) 

Groce, A., Chaki, S., Kroening, D., Strichman, O.: Error explanation with distance 
metrics. STTT 8(3), 229-247 (2006) 

Hallé, S.: Causality in message-based contract violations: a temporal logic “who- 
dunit”. In: Proceedings of the EDOC 2011, pp. 171-180. IEEE Computer Society 
(2011) 
Hallé, S., Bergeron, N., Guérin, F., Le Breton, G., Beroual, O.: Declarative layout 
constraints for testing web applications. J. Log. Algebraic Meth. Program. 85(5), 
737-758 (2016) 

Hallé, S., Khoury, R., Awesso, M.: Streamlining the inclusion of computer experi- 
ments in a research paper. IEEE Comput. 51(11), 78-89 (2018) 

Hallé, S.: Event Stream Processing With BeepBeep 3: Log Crunching and Analysis 
Made Easy. Presses de l’Université du Québec (2018) 

Hallé, S., Tremblay, H.: Measuring the impact of lineage tracking in the Petit 
Poucet library (version v1.0), April 2021 

Halpern, J.Y., Pearl, J.: Causes and explanations: a structural-model approach, 
part I: causes. Brit. J. Philos. Sci. 56(4), 843-887 (2005) 

Karvounarakis, G., Ives, Z.G., Tannen, V.: Querying data provenance. In: Elma- 
garmid, A.K., Agrawal, D. (eds.) Proceedings of the SIGMOD 2010, pp. 951-962. 
ACM (2010) 

Khoury, R., Gaboury, S., Hallé, S.: Three views of log trace triaging. In: Cuppens, 
F., Wang, L., Cuppens-Boulahia, N., Tawbi, N., Garcia-Alfaro, J. (eds.) FPS 2016. 
LNCS, vol. 10128, pp. 179-195. Springer, Cham (2017). https://doi.org/10.1007/ 
978-3-319-51966-1_12 

Kupferman, O., Vardi, M.Y.: Model checking of safety properties. Formal Methods 
Syst. Des. 19(3), 291-314 (2001). https://doi.org/10.1023/A:1011254632723 
Lam, L., Chiueh, T.-C.: A General dynamic information flow tracking framework 
for security applications. In Proceedings of the ACSAC 2006, pp. 463-472, Miami 
Beach, FL, USA, December 2006. IEEE 


522 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


46. 


47. 


48. 


49. 


S. Hallé and H. Tremblay 


Leek, T., Brown, R., Zhivich, M., Lippmann, R.: Coverage maximization using 
dynamic taint tracing. Technical Report 1112, Massachusetts Institute of Technol- 
ogy (2007) 

McCamant, S., Ernst, M.D.: Quantitative information flow as network flow capac- 
ity. In: Gupta, R., Amarasinghe, S.P. (eds.) Proceedings of the PLDI 2008, pp. 
193-205. ACM (2008) 

Møller, A., Schwartzbach, M.I.: Static program analysis, October 2018. Department 
of Computer Science, Aarhus University. http://cs.au.dk/~amoeller/spa/ 
Mukherjee, S., Dasgupta, P.: Computing minimal debugging windows in failure 
traces of AMS assertions. IEEE Trans. CAD Integr. Circ. Syst. 31(11), 1776-1781 
(2012) 

Newsome, J., Song, D.X.: Dynamic taint analysis for automatic detection, analysis, 
and signature generation of exploits on commodity software. In: Proceedings of the 
NDSS 2005. The Internet Society (2005) 

Pérez, B., Rubio, J., Sdenz-Adan, C.: A systematic review of provenance sys- 
tems. Knowl. Inf. Syst. 57(3), 495-543 (2018). https://doi.org/10.1007/s10115- 
018-1164-3 

Rival, X., Yi, K.: Introduction to Static Analysis: An Abstract Interpretation Per- 
spective. MIT Press, Cambridge (2020) 

Rohrmann, T.: Introducing complex event processing (CEP) with Apache Flink, 
2016. https://flink.apache.org/news/2016/04/06/cep-monitoring.html. Accessed 
17 Nov 2019 

Roudjane, M., Rebaine, D., Khoury, R., Hallé, S.: Detecting trend deviations with 
generic stream processing patterns. Inf. Syst. 101446, 1-24 (2019). https://doi. 
org/10.1016/j.is.2019.101446 

Samek, W., Wiegand, T., Miiller, K.-R.: Explainable artificial intelligence: under- 
standing, visualizing and interpreting deep learning models. ITU J. 1 (2017). 
arXiv: 1708.08296 

Schuppan, V., Biere, A.: Shortest counterexamples for symbolic model checking 
of LTL with past. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, 
vol. 3440, pp. 493-509. Springer, Heidelberg (2005). https: //doi.org/10.1007/978- 
3-540-31980- 1_32 

Suh, G.E., Lee, J.W., Zhang, D., Devadas, S.: Secure program execution via 
dynamic information flow tracking. In: Mukherjee, S., McKinley, K.S., (eds.) Pro- 
ceedings of the ASPLOS 2004, pp. 85-96. ACM (2004) 

Vachharajani, N., et al.: RIFLE: an architectural framework for user-centric 
information-flow security. In: Proceedings of the MICRO-37, pp. 243-254. IEEE 
Computer Society (2004) 

van Zelst, S.J., Bolt, A., Hassani, M., van Dongen, B.F., van der Aalst, W.M.P.: 
Online conformance checking: relating event streams to process models using prefix- 
alignments. Int. J. Data Sci. Anal. 8(3), 269-284 (2019). https://doi-org/10.1007/ 
s41060-017-0078-6 

Vandebogart, S., et al.: Labels and event processes in the Asbestos operating sys- 
tem. ACM Trans. Comput. Syst. 25(4), 11 (2007) 

Velegrakis, Y., Miller, R.J., Mylopoulos, J.: Representing and querying data trans- 
formations. In: Aberer, K., Franklin, M.J., Nishio, S. (eds.) Proceedings of the 
ICDE 2005, pp. 81-92. IEEE Computer Society (2005) 

Walsh, T.A., McMinn, P., Kapfhammer, G.M.: Automatic detection of potential 
layout faults following changes to responsive web pages. In: Proceedings of the 
ASE 2015, pp. 709-714. ACM (2015) 


Foundations of Fine-Grained Explainability 523 


50. Wang, C., Yang, Z., Ivančić, F., Gupta, A.: Whodunit? Causal analysis for coun- 
terexamples. In: Graf, S., Zhang, W. (eds.) ATVA 2006. LNCS, vol. 4218, pp. 82-95. 
Springer, Heidelberg (2006). https: //doi.org/10.1007/11901914_9 

51. Wang, Y.R., Madnick, S.E.: A polygen model for heterogeneous database systems: 
the source tagging perspective. In: McLeod, D., Sacks-Davis, R., Schek, H. (eds.) 
Proceedings of the VLDB 1990, pp. 519-538. Morgan Kaufmann (1990) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


Check for 
updates 


Latticed k-Induction with an Application 
to Probabilistic Programs 


Kevin Batz! ®©, Mingshuai Chen! ®©, Benjamin Lucien Kaminski? ®©, 
Joost-Pieter Katoen!()@, Christoph Matheja ©, and Philipp Schröer!® 


1 RWTH Aachen University, Aachen, Germany 
{kevin.batz,chenms,katoen}@cs.rwth-aachen.de 
? University College London, London, UK 
b.kaminski@ucl.ac.uk 
3 ETH Zürich, Zürich, Switzerland 
cmatheja@inf.ethz.ch 


Abstract. We revisit two well-established verification techniques, k-in- 
duction and bounded model checking (BMC), in the more general setting 
of fixed point theory over complete lattices. Our main theoretical contri- 
bution is latticed k-induction, which (i) generalizes classical k-induction 
for verifying transition systems, (ii) generalizes Park induction for bound- 
ing fixed points of monotonic maps on complete lattices, and (iii) extends 
from naturals k to transfinite ordinals «, thus yielding K-induction. 

The lattice-theoretic understanding of k-induction and BMC enables 
us to apply both techniques to the fully automatic verification of infinite- 
state probabilistic programs. Our prototypical implementation manages 
to automatically verify non-trivial specifications for probabilistic pro- 
grams taken from the literature that—using existing techniques—cannot 
be verified without synthesizing a stronger inductive invariant first. 


Keywords: k-induction - Bounded model checking - Fixed point 
theory - Probabilistic programs - Quantitative verification 


1 Introduction 


Bounded model checking (BMC) [12,17] is a successful method for analyzing 
models of hardware and software systems. For checking a finite-state transition 
system (TS) against a safety property (“bad states are unreachable”), BMC 
unrolls the transition relation until it either finds a counterexample and hence 
refutes the property, or reaches a pre-computed completeness threshold on the 
unrolling depth and accepts the property as verified. For infinite-state systems, 
however, such completeness thresholds need not exist (cf. [64]), rendering BMC a 
refutation-only technique. To verify infinite-state systems, BMC is typically com- 
bined with the search for an inductive invariant, i.e., a superset of the reachable 


This work has been partially funded by the ERC Advanced Project FRAPPANT under 
grant No. 787914. 
© The Author(s) 2021 


A. Silva and K. R. M. Leino (Eds.): CAV 2021, LNCS 12760, pp. 524-549, 2021. 
https: //doi.org/10.1007/978-3-030-81688-9_25 


Latticed k-Induction with an Application to Probabilistic Programs 525 


states which is closed under the transition relation. Proving a—not necessarily 
inductive—safety property then amounts to synthesizing a sufficiently strong, 
often complicated, inductive invariant that excludes the bad states. A plethora 
of techniques target computing or approximating inductive invariants, includ- 
ing IC3 [14], induction [13,20], interpolation [50,51], and predicate abstrac- 
tion [27,36]. However, invariant synthesis may burden full automation, as it 
either relies on user-supplied annotations or confines push-button technologies 
to semi-decision or approximate procedures. 

k-induction [65] generalizes the principle of simple induction (aka 1- 
induction) by considering k consecutive transition steps instead of only a single 
one. It is more powerful: an invariant can be k-inductive for some k > 1 but not 
1-inductive. Following the seminal work of Sheeran et al. [65] which combines 
k-induction with SAT solving to check safety properties, k-induction has found 
a broad spectrum of applications in the realm of hardware [29,37,45,65] and 
software verification [10,21—23,55,63]. Its success is due to (1) being a founda- 
tional yet potent reasoning technique, and (2) integrating well with SAT/SMT 
solvers, as also pointed out in [45]: “the simplicity of applying k-induction made 
it the go-to technique for SMT-based infinite-state model checking”. This paper 
explores whether k-induction can have a similar impact on the fully automatic 
verification of infinite-state probabilistic programs. That is, we aim to verify that 
the expected value of a specified quantity—think: “quantitative postcondition” — 
after the execution of a probabilistic program is bounded by a specified threshold. 


Example 1 (Bounded Retransmission Protocol [19,32]). The loop 


while (sent < toSend ^ fail < maxrFail) { 
{ fail = 0; sent := sent +1} [0.9] { fail:= fail+ 1; totalFail := totalFail+ 1} 
} 


models a simplified version of the bounded retransmission protocol, which 
attempts to transmit toSend packages via an unreliable channel (that fails with 
probability 0.1) allowing for at most mazFail retransmissions per package. 

Using our generalization of k-induction, we can fully automatically verify that 
the expected total number of failed transmissions is at most 1, if the number 
of packages we want to (successfully) send is at most 3. In terms of weakest 
preexpectations [38,44,49], this quantitative property reads 


wp[C] (totalFail) < [toSend < 3] - (totalFail+ 1) + [toSend > 3] - oo. 


The bound on the right-hand-side of the inequality is 4-inductive, but not 1- 
inductive; verifying the same bound using 1-induction requires finding a non- 
trivial—and far less perspicuous—inductive invariant. Moreover, if we consider 
an arbitrary number of packages to send, i.e., we drop [toSend < 3], this bound 
becomes invalid. In this case, our BMC procedure produces a counterexample, 
i.e., values for toSend and mazFail, proving that the bound does not hold. < 


Lifting the classical formalization (and SAT encoding) of k-induction over 
TSs to the probabilistic setting is non-trivial. We encounter the following chal- 
lenges: 
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(A) Quantitative reachability. In a TS, a state reachable within k steps 
remains reachable on increasing k. In contrast, reachability probabilities in 
Markov chains—a common operational model for probabilistic programs [28]— 
may increase on increasing k. Hence, proving that the probability of reaching 
a bad state remains below a given threshold is more intricate than reasoning 
about qualitative reachability. 

(B) Countereramples are subsystems. In a TS, an acyclic path from an initial 
to a bad state suffices as a witness for refuting safety, i.e., non-reachability. SAT 
encodings of k-induction rely on this by expressing the absence of witnesses 
up to a certain path-length. In the probabilistic setting, however, witnesses are 
no longer single paths [30]. Rather, a witness for the probability of reaching a 
bad state to exceed a threshold is a subsystem [15], i.e., a set of possibly cyclic 
paths. 

(C) Symbolic encodings. To enable fully automated verification, we need a 
suitable encoding such that our lifting integrates well into SMT solvers. Verify- 
ing probabilistic programs involves reasoning about execution trees, where each 
(weighted) branch corresponds to a probabilistic choice. A suitable encoding 
needs to capture such trees which requires more involved theories than encoding 
paths in classical k-induction. 

We address challenges (A) and (B) by developing latticed k-induction, which 
is a proof technique in the rather general setting of fixed point theory over 
complete lattices. Latticed k-induction generalizes classical k-induction in three 
aspects: (1) it works with any monotonic map on a complete lattice instead of 
being confined to the transition relation of a transition system, (2) it generalizes 
the Park induction principle for bounding fixed points of such monotonic maps, 
and (3) it extends from natural numbers k to (possibly transfinite) ordinals x, 
hence its short name: K-induction. 

It is this lattice-theoretic understanding that enables us to lift both k-in- 
duction and BMC to reasoning about quantitative properties of probabilistic 
programs. To enable automated reasoning, we address challenge (C) by an incre- 
mental SMT encoding based on the theory of quantifier-free mixed integer and 
real arithmetic with uninterpreted functions (QF_UFLIRA). We show how to effec- 
tively compute all needed operations for «-induction using the SMT encoding 
and, in particular, how to decide quantitative entailments. 

A prototypical implementation of our method demonstrates that «-induction 
for (linear) probabilistic programs manages to automatically verify non-trivial 
specifications for programs taken from the literature which—using existing tech- 
niques—cannot be verified without synthesizing a stronger inductive invariant. 

Due to space restrictions, most proofs and details about individual bench- 
marks have been omitted; they are found in an extended version of this paper [8]. 


Related Work. Besides the aforementioned related work on k-induction, we 
briefly discuss other automated analysis techniques for probabilistic systems and 
other approaches for bounding fixed points. Symbolic engines exist for exact 
inference [26] and sensitivity analysis [33]. Other automated approaches focus 
on bounding expected costs [56], termination analysis [2,16], and static analy- 
sis [3,67]. BMC has been applied in a rather rudimentary form to the on-the-fly 
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verification of finite unfoldings of probabilistic programs [35], and the enumer- 
ative generation of counterexamples in finite Markov chains [68]. (Semi-)auto- 
mated invariant-synthesis techniques can be found in [6,24,41]. A recent variant 
of IC3 for probabilistic programs called PrIC3 [7] is restricted to finite-state sys- 
tems. When applied to finite-state Markov chains, our «-induction operator is 
related to other operators that have been employed for determining reachabilitiy 
probabilities through value iteration [4,31,61]. In particular, when iterated on 
the candidate upper bound, the «-induction operator coincides with the (upper 
value iteration) operator in interval iteration [4]; the latter operator can be used 
together with the up-to techniques (cf. [53,58,59]) to prove our «-induction rule 
sound (in contrast, we give an elementary proof). However, the «-induction oper- 
ator avoids comparing current and previous iterations. It is thus easier to imple- 
ment and more amenable to SMT solvers. Finally, the proof rules for bounding 
fixed points recently developed in [5] are restricted to finite-state systems. 


2 Verification as a Fixed Point Problem 


We start by recapping some fundamentals on fixed points of monotonic operators 
on complete lattices before we state our target verification problem. 


Fundamentals. For the next three sections, we fix a complete lattice (E, E), ie. 
a carrier set E together with a partial order C, such that every subset S C E 
has a greatest lower bound [] S (also called the meet of S) and a least upper 
bound | |S (also called the join of S). For just two elements {g,h} C E, we 
denote their meet by gM h and their join by g U h. Every complete lattice has a 
least and a greatest element, which we denote by L and T, respectively. 

In addition to (E, E), we also fix a monotonic operator ®: E — E. By 
the Knaster-Tarski theorem [43,47,66], every monotonic operator ® admits a 
complete lattice of (potentially infinitely many) fixed points. The least fixed 
point lfp ® and the greatest fixed point gfp ® are moreover constructible by 
(possibly transfinite) fixed point iteration from L and T, respectively: Cousot & 
Cousot [18] showed that there exist ordinals a and 3, such that! 


lfp = G1(1) and gfpð = 8l] (T), (t) 


where g’! (g) denotes the upper 5-fold iteration and l°! (g) denotes the lower 
6-fold iteration of & on g, respectively. Formally, gl? (g) is given by? 


1 We use lowercase greek letters a, 3, y, ô, etc. to denote arbitrary (possibly transfinite) 
ordinals and i, j, k, m, n, etc. to denote natural (finite) numbers in N. 

2 To ensure well-definedness of transfinite iterations, we fix an ambient ordinal v and 
tacitly assume 6 < v for all ordinals 5 considered throughout this paper. Formally, 
v is the smallest ordinal such that |v| > ||. Intuitively, v then upper-bounds the 
length of any repetition-free sequence over elements of FE. 
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g =i, 
DII (g) = <8 (om (9) if ô = y + 1 is a successor ordinal, 


U fom (9) |y< 6} if ô is a limit ordinal. 


Intuitively, if 6 is the successor of y, then we simply do another iteration of ®. 
If 6 is a limit ordinal, then gl (g) can also be thought of as a limit, namely 
of iterating on g. However, simply iterating ® on g need not always converge, 
especially if the iteration does not yield an ascending chain. To remedy this, we 
take as limit the join over the whole (possibly transfinite) iteration sequence, 
i.e., the least upper bound over all elements that occur along the iteration. The 
lower 6-fold iteration l°! (g) is defined analogously to gl (g), except that we 
take a meet instead of a join whenever ô is a limit ordinal. 

An important special case for fixed point iteration (see (7)) is when the opera- 
tor & is Scott-continuous (or simply continuous), i.e., if ® (LJ{gi E go C...}) = 
LIS ({g1 E g2 E ...}). In this case, a in (t) coincides with the first infinite limit 
ordinal w (which can be identified with the set N of natural numbers). This fact 
is also known as the Kleene fixed point theorem [1]. 


Problem Statement. Fixed points are ubiquitous in computer science. Prime 
examples of properties that can be conveniently characterized as least fixed 
points include both the set of reachable states in a transition system and the 
function mapping each state in a Markov chain to the probability of reaching 
some goal state (cf. [60]). However, least and greatest fixed points are often 
difficult or even impossible [39] to compute; it is thus desirable to bound them. 
For example, it may be sufficient to prove that a system modeled as a Markov 
chain reaches a bad state from its initial state with probability at most 107°, 
instead of computing precise reachability probabilities for each state. Moreover, 
if said probability is not bounded by 1078, we would like to witness that as well. 
In general lattice-theoretic terms, our problem statement reads as follows: 


Given a complete lattice (E, E), a monotonic operator ®: E > E, 
and a candidate upper bound f € E on lfp ®, 


prove or refute that lfp @ CE f. 


For proving, we will present latticed k-induction; for refuting, we will present 
latticed bounded model checking. Running both in parallel may (and under certain 
conditions: will) lead to a decision of the above problem. 


3 Latticed k-Induction 


In this section, we generalize the well-established k-induction verification tech- 
nique [23,29,37,45,55,65] to latticed k-induction (for short: K-induction; reads: 
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Fig. 1. «-induction and latticed BMC in case that lfp E f. An arrow from g to h 
indicates g E h. The solid blue arrow from ow} ( f)) to f is the premise of k- 
induction, i.e., the LHS of Lemma 2, which implies the dash-dotted blue arrow from 
ow"! (f)) to gi (f), i.e., the RHS of Lemma 2. The dashed blue arrow from Ifp ® 


to p(y!" (f)) is a consequence of the dash-dotted arrow (by Park induction, Theorem 
1) and ultimately proves that Ifp SE f. 


“kappa induction”). With «-induction, our aim is to prove that Ifp ® E f. To 
this end, we attempt “ordinary” induction, also known as Park induction: 


Theorem 1 (Park Induction [57]). Let f € E. Then 
@(f) E f implies Ifp® C f. 


Intuitively, this principle says: if pushing our candidate upper bound f through & 
takes us down in the partial order C, we have verified that f is indeed an upper 
bound on lfp &. The true power of Park induction is that applying ® once tells 
us something about iterating ® possibly transfinitely often (see ({) in Sect. 2). 

Park induction, unfortunately, does not work in the reverse direction: If we 
are unlucky, f 3 lfp @ is an upper bound on Ifp &, but nevertheless (f) Z f. 
In this case, we say that f is not inductive. But how can we verify that f is 
indeed an upper bound in such a non-inductive scenario? We search below f for 
a different, but inductive, upper bound on Ifp ®, that is, we 


search for anh € FE such that fp E Oh) CHE f. 


In order to perform a guided search for such an h, we introduce the «-induction 
operator—a modified version of ® that is parameterized by our candidate f: 


Definition 1 («K-Induction Operator). For f € E, we call 
Wp: E > E, g= @&(g)Nf 


the k-induction operator (with respect to f and ®). 
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What does Wp do? As illustrated in Fig. 1, if (f) Z f (i.e. f is non-inductive) 
then “at least some part of ®(f) is greater than f”. If the whole of &(f) is 
greater than f, then f C ®(f); if only some part of ® (f) is greater and some is 
smaller than f, then f and &(f) are incomparable. The «-induction operator Wy, 
now rectifies ®(f) being (partly) greater than f by pulling ®(f) down via the 
meet with f (i.e., via . 1 f), so that the result is in no part greater than f. 
Applying Wy to f hence always yields something below or equal to f. 

Together with the observation that Wy is monotonic, iterating Wp on f nec- 
essarily descends from f downwards in the direction of Ifp ® (and never below): 


Lemma 1 (Properties of the «-Induction Operator). Let f € E and 
let Wy be the K-induction operator with respect to f and P. Then 


(a) Wy is monotonic, t.e.,Vgi,g2€ E: gı E g2 implies Wr(gi) E Wy(g2). 
(b) Iterations of Wy starting from f are descending, i.e., for all ordinals y, 6, 


y <6 implies YIA) E DPA). 


(c) Wp is dominated by P, i.e., YgE E: Vlg) E (g9). 
(d) If \fp ®C f, then for any ordinal ô, 


pe E o ETMT E a beg) EDUNET 


The descending sequence f 3 W;(f) 3 wi? (f) 2 ... constitutes our guided 
search for an inductive upper bound on ifp ®. For sadi ordinal « (hence the 


short name: «-induction), wh" (f) is a potential candidate for Park induction: 


a (ain) MEY wp. ( 


For efficiency reasons, e.g., when offloading the above inequality check to an SMT 
solver, we will not check the inequality (t) directly but a property equivalent 


to ({), namely whether &(Y; ple (f)) is below f instead of wt" (f): 
Lemma 2 (Park Induction from «-Induction). Let f € E. Then 


a (WA) CE f if a (al) c Ip). 


Proof. The if-direction is trivial, as wir (f) E f (Lemma 1(d)). For only-if: 


g m grm (f) (by Lemma 1(b)) 
= UF (wp A) (by definition of Y! TH (f)) 
na (o ( 2) nf (by definition of Wp) 


p (gr! A) : (by the premise) 
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Algorithm 1: Latticed k-induction Algorithm 2: Latticed BMC 


input: @: E — E and f € E. input: @: E > E and f € E. 
output: “verify” if f is a k-inductive output: “refute” if there exists 
invariant, diverge otherwise. k € N with gi”! (1) Z f, 
diverge otherwise. 
1gcf; igel; 
2 while £ (g) Z f do 2 repeat 
3 | g Ū;(9); 3 | g= (9); 
| // recall: Ys(g)=8(g9)0 f 4 untilg Z f ; 
4 return verify ; 5 return refute ; 


If p(w (f)) E f, then Lemma 2 tells us that wt" (f) is Park inductive and 
thereby an upper bound on lfp ®. Since iterating Wy on f yields a descending 
iteration sequence (see Lemma 1(b)), oF (f) is below f and therefore f is also 
an upper bound on lfp ®. Put in more traditional terms, we have shown that 
wh (f) is an inductive invariant stronger than f. Formulated as a proof rule, 
we obtain the following induction principle: 


Theorem 2 («-Induction). Let f € E and let k be an ordinal. Then 


a (W) E f implies lfp C f. 


Proof. Following the argument above, for details see [8, Appx. A.2]. 


An illustration of «-induction is shown in (the right frame of) Fig. 1. For every 
ordinal «, if ow}! (f)) E f, then we call f (k+1)-inductive (for P). In particu- 
lar, K-induction generalizes Park induction, in the sense that 1-induction is Park 
induction and, («< > 1)-induction is a more general principle of induction. 

Algorithm 1 depicts a (semi-)algorithm that performs latticed k-induction (for 
k < w) in order to prove lfp P E f by iteratively increasing k. For implementing 
this algorithm, we require, of course, that both ® and Wy are computable and that 
E is decidable. Notice that the loop (lines 2-3) never terminates if f C &(f)— 
a condition that can easily be checked before entering the loop. Even with this 
optimization, however, Algorithm 1 is a proper semi-algorithm: even if Ifp ® C f, 
then f is still not guaranteed to be k-inductive for some k < w. And even if an 
algorithm could somehow perform transfinitely many iterations, then f is still 
not guaranteed to be «-inductive for some ordinal k: 


Counterexample 1 (Incompleteness of «-Induction). Consider the car- 
rier set {0,1,2}, partial order 0 C 1 C 2, and the monotonic operator ® with 
&(0) = 0 = Ifp S, and (1) = 2, and &(2) = 2 = gfp ©. Then Ifp ® C1, but for 
any ordinal K, gl” (1) = 1 and (1) = 2 Z 1. Hence 1 is not k-inductive. < 


Despite its incompleteness, we now provide a sufficient criterion which ensures 
that every upper bound on lfp ® is «-inductive for some ordinal «. 
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Theorem 3 (Completeness of «-Induction for Unique Fixed Point). If 
lfp & = gfp P (i.e. B has exactly one fixed point), then, for every f € E, 


fp C f implies f is K-inductive for some ordinal k. 


Proof. By the Knaster-Tarski theorem, we have L8] (T) = gfp 8 for some ordi- 
nal 3. We then show that f is (G+1)-inductive; see [8, Appx A.3] for details. 


The proof of the above theorem immediately yields that, if the unique fixed point 
can be reached through finite fixed point iterations starting at T, then f is k- 
inductive for some natural number k; Algorithm 1 thus eventually terminates. 


Corollary 1. If BL” (T) = lfp @ for some n € N, then, for every f € E, 


lfp C f implies f is n-inductive for some n € N. 


4 Latticed vs. Classical k-Induction 


We show that our purely lattice-theoretic «-induction from Sect. 3 generalizes 
classical k-induction for hardware- and software verification. To this end, we first 
recap how k-induction is typically formalized in the literature [10, 23, 29,37]: 
Let TS = (S, I,T) be a transition system, where S is a (countable) set of 
states, I C S is a non-empty set of initial states, and T C S x S is a transition 
relation. As in the seminal work on k-induction [65], we require that T is a 
total relation, i.e., every state has at least one successor. This requirement is 
sometimes overlooked in the literature, which renders the classical SAT-based 
formulation of k-induction ((1a) and (1b) below) unsound in general. 

Our goal is to verify that a given invariant property P C S covers all states 
reachable in TS from some initial state. Suppose that J, T and P are character- 
ized by logical formulae I(s), T(s, s’) and P(s) (over the free variables s and s’), 
respectively. Then, achieving the above goal with classical k-induction amounts 
to proving the validity of 


I(s1) AT (81, 82) A... AT (Sp-1,8¢) => P(si)A...A P(sk), and (la) 
P(s1) A T(s1,852) A... A P(sk) AT(sk,Sk41) => P(sk+1). (1b) 


Here, the base case (la) asserts that P holds for all states reachable within 
k transition steps from some initial state; the induction step (1b) formalizes 
that P is closed under taking up to k transition steps, i.e., if we start in P and 
stay in P for up to k steps, then we also end up in P after taking the (k+1)- 
st step. If both (1a) and (1b) are valid, then classical k-induction tells us that 
the property P holds for all reachable states of TS. How is the above principle 
reflected in latticed k-induction (cf. Sect. 3)? For that, we choose the complete 
lattice (2°, C), where 2° denotes the powerset of S; the least element is L = Ø 
and the meet operation is standard intersection N. 
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Moreover, we define a monotonic operator ® whose least fixed point precisely 
characterizes the set of reachable states of the transition system TS: 


Ge 2 = 25, F + IU Suces(F), 


That is, maps any given set of states F C S to the union of the initial states I 

and of those states Succs(F’) that are reachable from F using a single transition.’ 
Using the «-induction operator Wp constructed from ® and P according to 

Definition 1, the principle of k-induction (cf. Theorem 2) then tells us that 


a (w}"(P)) CP impies Ifpd CP. 
SEY 


reachable states of TS 


For our above choices, the premise of «-induction equals the classical formaliza- 
tion of k-induction—formulae (la) and (1b)—because the set of initial states I 
is “baked into” the operator ®. More concretely, for the base case (la), we have 


I(s1) AT (81, 82) A... A T(Sk-1, 8%) => P(si)A...A P(sk). 
(0) 


_—_ 
pl (0) 
eS 
lkl (0) 
a ,—O—_ Ow 
meaning a'l (0) CP 


In other words, formula (la) captures those states that are reachable from I via 
at most k transitions. If we assume that (la) is valid, then P contains all initial 
states and formula (1b) coincides with the premise of «-induction: 


P(s1) AT (81, 82) AP(s2) AT (82, 83) A... A P(sk) AT (Sk, Sk41) => P(sk+1). 
eS 
P(P) 
Wp(P) = &(P)NP 


a. SS 
wht (pP) 


s(t- (P)) 


— eee eee ee 
meaning o(p (P)) CP 


It follows that, when considering transition systems, our (latticed) «-induction 
is equivalent to the classical notion of k-induction for Kk < w: 


Theorem 4. For every natural number k > 1, 


p (ve (P)) C P iff formulae (la) and (1b) are valid. 


3 Formally, Suces(F) £ {t | t€ F, (t,t) €T}. 
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C := skip eu=n pus e<e 
| wise | 2 | pA 
| C;C | n-e | =~ 
| {C}[p]{C} | e+e 
| if (vy) {C}else {C} | e+e (monus max{0,e — e}) 
| while(y){C} 
(a) pGCL programs (b) Linear expressions (c) Linear guards 


Fig. 2. Syntax of pGCL programs, linear expressions, and guards, where x is a variable 
taken from a countable set Vars of program variables (evaluating to natural numbers), 
p € [0,1] ÑQ is a rational probability, and n € N is a constant. 


5 Latticed Bounded Model Checking 


We complement «-induction with a latticed analog of bounded model check- 
ing [11,12] for refuting that Ifp  C f. In lattice-theoretic terms, bounded model 
checking amounts to a fixed point iteration of on L while continually checking 
whether the iteration exceeds our candidate upper bound f. If so, then we have 
indeed refuted lfp CE f: 


Theorem 5 (Soundness of Latticed BMC). Let f € E. Then 


ordinal 6: (1) Z f implies Ifp® Z f. 


Furthermore, if we were actually able to perform transfinite iterations of ® on L, 
then latticed bounded model checking is also complete: If f is in fact not an upper 
bound on lfp ®, this will be witnessed at some ordinal: 


Theorem 6 (Completeness of Latticed BMC). Let f € E. Then 


fpð Z f implies Jordinal ô: (1) Z f. 


More practically relevant, if ® is continuous (which is the case for Bellman oper- 
ators characterizing reachability probabilities in Markov chains), then a simple 
finite fixed point iteration, see Algorithm 2, is sound and complete for refutation: 


Corollary 2 (Latticed BMC for Continuous Operators). Let f € E and 
let be continuous. Then 


JneN: O (1) Zf iff IPO f. 


6 Probabilistic Programs 


In the remainder of this article, we employ latticed k-induction and BMC to ver- 
ify imperative programs with access to discrete probabilistic choices—branching 
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on the outcomes of coin flips. In this section, we briefly recap the necessary 
background on formal reasoning about probabilistic programs (cf. [44,49] for 
details). 


6.1 The Probabilistic Guarded Command Language 


Syntax. Programs in the probabilistic guarded command language pGCL adhere 
to the grammar in Fig. 2a. The semantics of most statements is standard. In 
particular, the probabilistic choice {C1 } [p] {C2} flips a coin with bias p € 
[0,1] A Q. If the coin yields heads, it executes C1; otherwise, C2. In addition 
to the syntax in Fig.2, we admit standard expressions that are definable as 
syntactic sugar, e.g., true, false, p1 V yo, €1 = €2, €1 < €2, ete. 


Program States. A program state o maps every variable in Vars to its value, 
i.e., a natural number in N.* To ensure that the set of program states X remains 
countable, we restrict ourselves to states in which only finitely many variables— 
those that appear in a given program—evaluate to non-zero values. Formally, 


r2 { o: Vars >N | |{ x € Vars | o(x) #0 }| sah: 


The evaluation of expressions e and guards y under a state g, denoted by e(o) 
and y(c), is standard. For example, we define the evaluation of “monus” as 


(e1 ~e2)(c) = max{0, e1(c) — e2(c)}. 


6.2 Weakest Preexpectations 


Expectations. An expectation f: X — RX is a map from program states to the 
non-negative reals extended by infinity. We denote by E the set of all expecta- 
tions. Moreover, (E, <) forms a complete lattice, where the partial order < is 
given by the pointwise application of the canonical ordering < on RS), i.e., 


fsg iff Voer: flo) < glo). 


To conveniently describe expectations evaluating to some r € RY for every state, 
we slightly abuse notation and denote by r the constant expectation àg. r. Sim- 
ilarly, given an arithmetic expression e, we denote by e the expectation Ac. e(o). 


4 We prefer unsigned integers because our quantitative “specifications” (aka expecta- 
tions) must evaluate to non-negative numbers. Otherwise, expectations like x+y are 
not well-defined, and, as a remedy, we would frequently have to take the absolute 
value of every program variable. Restricting ourselves to unsigned variables does not 
decrease expressive power as signed variables can be emulated (cf. [9, Sec. 11.2]). 

5 In order to avoid any technical issues pertaining to measurability. 
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Table 1. Rules defining the weakest preexpectation transformer. 


C wp [C] (g) 

skip g 

z:=e g [x/e] 

C1; C2 wp[C1] (wplC2] (9)) 

{Ci} [p] {C2} p-wp[Ci] (g9) + (1 — p) - we[C2] (9) 
if (y) {C1} else {C2} [y]: welCi] (9) + [Fy] - wplC2] (9) 
while (¢){C’} lfp he [e]; g + [p]: we[C’] (h) 


The least element of (E, <) is 0 and the greatest element is co. We employ the 
Iverson bracket notation to cast Boolean expressions into expectations, i.e., 


il ae 1 if y(o) = true, 
P SOE" V0 if p(o) = false. 


The weakest preexpectation transformer wp: pGCL — (E — E) is defined in 
Table 1, where g [x/e] denotes the substitution of variable x by expression e, i.e., 


om A elo) ify=a, 
g|z/e] =Ao- g(a |x e(o)]), where o[x+ e(o)] = ày. e Ce N 


We call wp|C] (g) the weakest preexpectation of program C w.r.t. postexpecta- 
tion g. The weakest preexpectation wp[C] (g) is itself an expectation of type E, 
which maps each initial state ø to the expected value of g after running C on ø. 
More formally, if 4% is the distribution over final states obtained by executing C 
on initial state ø, then for any postexpectation g [44], 


wplC] (9) (2) = $ onol): 9(7)- 


For a gentle introduction to weakest preexpectations, see [38, Chap. 2 and 4]. 


7 BMC and k-Induction for Probabilistic Programs 


We now instantiate latticed «-induction and BMC (as developed in Sects. 2 to 
5) to enable verification of loops written in pGCL; we discuss practical aspects 
later in Sects. 7.1 to 7.3 and Sect. 8. For the next two sections, we fix a loop 


Crop = while(y){C}. 


For simplicity, we assume that the loop body C is loop-free (every probabilistic 
program can be rewritten as a single while loop with loop-free body [62]). 
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Given an expectation g € E and a candidate upper bound f € E on 
the expected value of g after executing Cioop (i.e. wp[Cioop] (g)), we will 
apply latticed verification techniques to check whether f indeed upper-bounds 


wP[Croop] (g). 
To this end, we denote by @ the characteristic functional of Cioop and g, i.e., 


ð: E >E, he [>g]: g+ [p]: wple] (A), 


whose least fixed point defines wp[Cicop] (g) (cf. Table 1). We remark that ® is 
a monotonic—and in fact even continuous—operator over the complete lattice 
(E, <) (cf. Sect. 6.2). In this lattice, the meet is a pointwise minimum, i.e., 


Anh’ = hminh’ = do. min{h(c),h’(c)}. 


By Definition 1, ® and g then induce the (continuous) «-induction operator 


Wry: E >E, h + @&(h) min f. 


With this setup, we obtain the following proof rule for reasoning about proba- 
bilistic loops as an immediate consequence of Theorem 2: 


Corollary 3 (k-Induction for pGCL). For every natural number k € N, 


a (ap) < f implies wp[Crop] (9) < f. 


Analogously, refuting that f upper-bounds the expected value of g after execu- 
tion of Cloop via bounded model checking is an instance of Corollary 2: 


Corollary 4 (Bounded Model Checking for pGCL). 
dneN: PO Zf iff wp[Cioop] (9) £ F. 
Example 2 (Geometric Loop). The pGCL program 

Ceo = while(x=1){{x:=0}[0.5]{c:=c+1}} 


keeps flipping a fair coin x until it flips heads, sets x to 0, and terminates. 
Whenever it flips tails instead, it increments the counter c and continues. We 
refer to Cgeo as the “geometric loop” because after its execution, the counter 
variable c is distributed according to a geometric distribution. 

What is a (preferably small) upper bound on the expected value wp[Cgeo] (c) 
of c after execution of Cgeo? Using 2-induction, we can (automatically) verify that 
c+ 1 is indeed an upper bound: Since ®(W41(c+ 1)) < c+ 1, where ® denotes 
the characteristic functional of Cgeo, Corollary 3 yields wp[Cgeo] (c) < c+ 1. 

However, c+ 1 cannot be proven an upper bound using Park induction 
as it is not inductive. Moreover, it is indeed the least upper bound, i.e., any 
smaller bound is refutable using BMC (cf. Corollary 4). For example, we have 
wp[Cgeo] (c) A c+ 0.99, since gil (0) A c+ 0.99. Finally, we remark that some 
correct upper bounds only become x«-inductive for transfinite ordinals K. For 
instance, the innocuous-looking bound 2-c+1 is not k-inductive for any natural 
number k, but it is (w + 1)-inductive, since (wll (2 -c+1))X2-c+l. < 


538 K. Batz et al. 


In principle, we can semi-decide whether wp[C\oop] (g) A f holds or whether 
f is k-inductive for some k: it suffices to run Algorithms1 and 2 in parallel. 
However, for these two algorithms to actually be semi-decision procedures, we 
cannot admit arbitrary expectations. Rather, we restrict ourselves to a suitable 
subset Exp of expectations in E satisfying all of the following requirements: 


1. Exp is closed under computing the characteristic functional 9, i.e., 
Yh € Exp: @(h) is computable and belongs to Exp. 
2. Quantitative entailments between expectations in Exp are decidable, i.e., 
Vh,h’ € Exp: itis decidable whether h < K. 
3. (For k-induction) Exp is closed under computing meets, i.e., 
Yh, h’ € Exp: hminh’ is computable and belongs to Exp. 


Below, we show that linear expectations meet all of the above requirements. 


7.1 Linear Expectations 


Recall from Fig. 2b that we assume all expressions appearing in pGCL programs 
to be linear. For our fragment of syntactic expectations, we consider extended 
linear expressions ê that (1) are defined over rationals instead of natural numbers 
and (2) admit oo as a constant (but not as a subexpression). Formally, the set 
of extended linear expressions is given by the following grammar: 


č := e|œ e n= r|z|r-eļe+eļe>e (r € Q>0) 


Similarly, we admit extended linear expressions (without oo) in linear guards ¢.° 
With these adjustments to expressions and guards in mind, the set LinExp of 
linear expectations is defined by the grammar 


h == č | [y]-h | hth. 


We write h = h’ if h and h’ are syntactically identical; and h = h’ if they are 
semantically equivalent, i.e., if for all states o, we have h(a) = h'(o). 

Furthermore, the rescaling c-h of a linear expectation h by a constant c E€ Q>o 
is syntactic sugar for rescaling suitable’ arithmetic subexpressions of h, e.g., 


1/2- (|x = 1] -4+ 1/⁄3-£ +00) = 1/2- [x = 1] -4+ 1/2- 1/3- x+ 00 € LinExp. 


A formal definition of the rescaling c- h is found in [8, Appx A.5]. 


6€ We do not admit œo in guards for convenience. In principle, all comparisons with oo 
in guards can be removed by a simple preprocessing step. 

T We do not rescale every subexpression to account for the corner cases c- 00 = 00 
and 0:-c=0. 
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If we choose a linear expectation h as a postexpectation, then a quick inspec- 
tion of Table 1 reveals that the weakest preexpectation wp[C] (h) of any loop-free 
pGCL program C and h yields a linear expectation again. Hence, linear expec- 
tations are closed under applying ®— Requirement 1 above—because 


Vg,heLinExp: (h) = [Ay]-g + [y}-wp[C] (h). 
-a =-—{_ 
€ LinExp € LinExp 


N. ———__ 
€ LinExp 


7.2 Deciding Quantitative Entailments Between Linear 
Expectations 


To prove that linear expectations meet Requirement 2—decidability of quanti- 
tative entailments—we effectively reduce the question of whether an entailment 
h < k holds to the decidable satisfiability problem for QF_LIRA—quantifier-free 
mixed linear integer and real arithmetic (cf. [42]). 

As a first step, we show that every linear expectation can be represented as 
a sum of mutually exclusive extended arithmetic expressions—a representation 
we refer to as the guarded normal form (similar to [41, Lem. 1], [9, Lem. A.2]). 


Definition 2 (Guarded Normal Form (GNF)). h € LinExp is in GNF if 


h = De [pi] či, 


where €1,...,€n are extended linear expressions, n € N is some natural number, 
and ~1,---;Qn are linear Boolean expressions that partition the set of states, i.e., 
for each o € X there exists exactly one i € {1,...,n} such that p(o) = true. 


Lemma 3. Every linear expectation h € LinExp can effectively be transformed 
into an equivalent linear expectation GNF (h) = h in guarded normal form. 


The number of summands |GNF (A) | in GNF (A) is, in general, exponential in the 
number of summands in h. In practice, however, this exponential blow-up can 
often be mitigated by pruning summands with unsatisfiable guards. Throughout 
the remainder of this paper, we denote the components of GNF (h) and GNF (h’), 
where h and h’ are arbitrary linear expectations, as follows: 


GNF(h) = Do" [ei-@ and GNF(h') = SO" [Wy] a. 


j=l 
We now present a decision procedure for the quantitative entatlment over LinExp. 


Theorem 7. (Decidability of Quantitative Entailment over LinExp). 
For h,h' € LinExp, it is decidable whether h < h’ holds. 


Proof. Let h,h’ € LinExp. By Lemma 3, we have h < h’ iff GNF (h) < GNF (h’). 
Let o be some state. By definition of the GNF, o satisfies exactly one guard 
pi and exactly one guard pj. Hence, the inequality GNF (A) (a) < GNF (h’) (a) 
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does not hold iff €;(a) > @;(a) holds for the expressions č; and j guarded by y; 
and w,;, respectively. Based on this observation, we construct a QF_LIRA formula 
cex< (h,h’) that is unsatisfiable iff there is no counterexample to h < h’: 


cex< (h, h') £ Vi Va ods (pi A pj A encodelnfty (é;) > àj). 


Here, we identify every program variable in h or h’ with an N-valued SMT 
variable. Moreover, to account for comparisons with oo, we rely on the fact that 
our (extended) arithmetic expressions either evaluate to oo for every state or 
never evaluate to co. To deal with the case é; > oo, which is always false, we 
can thus safely exclude cases in which a; = œo holds. To deal with the case 
oo > aj, we represent co by some unbounded number, i.e., we introduce a fresh, 
unconstrained N-valued SMT variable infty and set encodelnfty (č) to infty if 
č = ©; otherwise, encodelnfty (č) = č. Since QF_LIRA is decidable (cf. [42]), we 
conclude that the quantitative entailment problem is decidable. 


Since quantitative entailments are decidable, we can already conclude that, for 
linear expectations, Algorithm 2 is a semi-decision procedure. 


7.3 Computing Minima of Linear Expectations 


To ensure that latticed k-induction on pGCL programs (cf. Algorithm 1 
and Sect. 7) is a semi-decision procedure when considering linear expectations, 
we have to consider Requirement 3—the expressability and computability of 
meets: 


Theorem 8. LinExp is effectively closed under taking minima. 


Proof. For k € N, let k = {1,...,k}. Then, for two linear expectations h, h’, the 
linear expectation GNF (h) min GNF (h’) € LinExp is given by: 


[pi A wy] j, if €; = 00, 

5 [pi A pj] 4 ej, if aj = 00, 

(ij) € nxm lyi A pj Ae < a5] -či + [yi A pj ^ ĉi > a5] i aj otherwise, 
where we exploit that, for every state, exactly one guard y; and exactly one 


guard 4; is satisfied (cf. Lemma 3). Notice that in the last case we indeed obtain 
a linear expectation since neither č nor a@ are equal to oo. 


8 Implementation 


We have implemented a prototype called KIPRO2—k-Induction for PRObabilis- 
tic PROgrams—in Python 3.7 using the SMT solver Z3 [54] and the solver-API 
PySMT [25]. Our tool, its source code, and our experiments are available online.® 


3 © https: //github.com/moves-rwth/kipro2. 
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KIPRO2 performs in parallel latticed k-induction and BMC to fully automati- 
cally verify upper bounds on expected values of pGCL programs as described in 
Sect. 7. In addition to reasoning about expected values, KIPRO2 supports veri- 
fying bounds on expected runtimes of pGCL programs, which are characterized 
as least fixed points à la [40]. Rather than fixing a specific runtime model, we 
took inspiration from [56] and added a statement tick (n) that does not affect 
the program state but consumes n € N time units. 

To discharge quantitative entailments and compute the meet, we use the con- 
structions in Theorems 7 and 8, respectively. As an additional optimization, we 
do not iteratively apply the k-induction operator Wp directly but use an incre- 
mental encoding. We briefly sketch our encoding for k-induction (Algorithm 2); 
the encoding for BMC is similar. In both cases, we employ uninterpreted func- 
tions on top of mixed integer and real arithmetic, i.e., QFLUFLIRA. 

Recall Example 2, the geometric loop Ceo, where we used k-induction to 


prove wp[C geo] (c) < c+ 1. For every k € N, owl (c + 1)) is given by 


[a = 1]- (0.5 WE (e+ 1) [x/0] + 0.5- Wl (641) [efe+ 1) + [2#1]-c. 
ES m 


Qk Qk 
V 
Px 
To obtain an incremental encoding, we introduce an uninterpreted function 
P,: N x N > Rso and a formula pp(c, x) specifying that P(c, x) characterizes 


(wlt (c+ 1)), ie., for all o € X and r € Ryo with BW (c+ 1)) (0) < 00,9 


pr(a(c),a(x)) A Py (o(c),o(x)) = r is satisfiable if r= 6 (WEA (c + 1)) (o). 


If ow) (c+ 1))(c) = œ, our construction of pz(x,c) ensures that the above 
conjunction is satisfiable for arbitrarily large r. Analogously, we introduce an 
uninterpreted function Qg: N x N — Rso that characterizes wl (e +1). 

In particular, may use all uninterpreted functions introduced for smaller 
or equal values of k—not just the function P,(c,x) it needs to characterize. 
This enables an incremental encoding, i.e., px(c,x) can be computed on top of 
pk-1(c, £) by reusing Py_i(c, x), Qk(c, £), and the construction in Theorem 8. 

Moreover, we can reuse pz (c, x) to avoid computing the (expensive) GNF for 
deciding certain quantitative entailments (cf. Theorem 7): For example, to check 
whether owl! (c+ 1)) Zh’ holds, we only need to transform the right-hand 
side into GNF (cf. Sect. 7.2), i.e., if GNF (h’) = So", [yy] - aj, then 


o(w(c+1)) Ag iff mal 


° Notice that we do not axiomatize in pz (c, x) that owt (c+1)) and P,(c, x) are the 
same function because we have no access to universal quantifiers. Rather, we spec- 
ify that both functions coincide for any fixed concrete values assigned to c and zx. 
This weaker notion is not robust against formal modifications of the parameters, 
e.g., through substitution. For example, to assign the correct interpretation to 


P(c, x) [c/c + 1], we have to construct a (second) formula px (c, x) [c/c + 1]. 


m 
eagle pj A Px(c, x) > G; is satisfiable. 
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9 Experiments 


We evaluate KIPRO2 on two sets of benchmarks. The first set, shown in Table 2, 
consists of four (infinite-state) probabilistic systems compiled from the literature; 
each benchmark is evaluated on multiple variants of candidate upper bounds: 


(1) brp is a pGCL variant of the bounded retransmission protocol [19,32]. The 
goal is to transmit toSend many packages via an unreliable channel allowing 
for at most mazFail many retransmissions per package (cf. Example 1). 
The variable totalFail keeps track of the total number of failed attempts 
to send a package. We verified upper bounds on the expected outcome of 
totalFail (variants 1-4). In doing so, we bound the number of packages to 
send by 4 (10, 20, 70) while keeping maxFail unbounded, i.e., we still verify 
an infinite-state system. We notice that k > 1 is required for proving any 
of the candidate bounds; for up to k = 11, KIPRO2 manages to prove non- 
trivial bounds within a few seconds. However, unsurprisingly, the complexity 
increases rapidly with larger k. While KIPRO2 can prove variant 3, it needs 
to increase k to 23; we observe that the complexity grows rapidly both in 
terms of the size of formulae and in terms of runtime with increased k. 
Furthermore, variants 5-7 correspond to (increasing) incorrect candidate 
bounds (totalFail+ 1, totalFail+ 1.5, totalFail+ 3) that are refuted (or time 
out) when not imposing any restriction on toSend. 

(2) geo corresponds to the geometric loop from Example 2. We verify that c+1 
upper-bounds the expected value of c for every initial state (variant 1); we 
refute the incorrect candidates c + 0.99 and c + 0.999999999999 (variants 
2-3). 

(3) rabin is a variant of Rabin’s mutual exclusion algorithm [46] taken from [34]. 
We aim to verify that the probability of obtaining a unique winning process 
is at most 2/3 for at most 2 (3, 4) participants (variants 1-3) and refute both 
1/3 (variant 4) and 3/5 (variant 5) for an unbounded number of participants. 

(4) unif_gen implements the algorithm in [48] for generating a discrete uniform 
distribution over some interval {1,...,/+-—1} using only fair coin flips. We 
aim to verify that 1/n upper-bounds the probability of sampling a particular 
element from any such interval of size at most n = 2 (8, 4, 5, 6) (variants 
1-5). 


Our second set of benchmarks, shown in Table 3, confirms the correctness of 
(1-inductive) bounds on the expected runtime of pGCL programs synthesized by 
the runtime analyzers ABSYNTH [56] and (later) KOAT [52]; this gives a baseline 
for evaluating the performance of our implementation. Moreover, it demonstrates 
the flexibility of our approach as we effortlessly apply the expected runtime 
calculus [40] instead of the weakest preexpectation calculus for verification. 


Setup. We ran Algorithms 1 and 2 in parallel using an AMD Ryzen 5 3600X pro- 
cessor with a shared memory limit of 8GB and a 15-minute timeout. For every 
benchmark finishing within the time limit, KIPRO2 either finds the smallest k 
required to prove the candidate bound by k-induction or the smallest unrolling 
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depth k to refute it. If KIPRO2 refutes, the SMT solver provides a concrete ini- 
tial state witnessing that violation. In Tables2 and 3, column #formulae gives 
the maximal number of conjuncts on the solver stack; formulae_t, sat_t, and 
total_t give the amount of time spent on (1) computing formulae, (2) satisfia- 
bility checking, and (3) everything (including preprocessing), respectively. The 
input consists of a program, a candidate upper bound, and a postexpectation; 
in Table 3, the latter is fixed to “postruntime” 0 and thus omitted. 


Table 2. Empirical results for the first benchmark set (time in seconds). 


postexpectation variant result k #formulae formulae_t satt totalt 

1 ind 5 285 0.15 0.01 0.28 

2 ind 11 2812 1.77 0.12 2.03 

3 ind 23 26284 17.68 28.09 45.94 

E totalFail 4 TO -= = = = - 
5 ref 13 949 0.84 14.39 15.28 

6 TO = 7 -= 7 - 

7 TO = a = -= = 

1 ind 2 18 0.01 0.00 0.08 

3 c 2 ref 11 103 0.04 0.01 0.09 
3 ref 46 1223 0.39 0.04 0.48 

1 ind 1 21 0.01 0.00 0.15 

a 2 ind 5 1796 1.27 0.03 1.44 
8 i= 3 TO - = 2 z # 
4 ref 4 458 0.31 0.03 0.40 

5 ref 8 10508 8.76 2.85 11.68 

1 ind 2 267 0.27 0.02 0.56 

g 2 ind 3 1402 1.45 0.10 1.81 
4 [e= i] 3 ind 3 1402 1.48 0.11 1.86 
5 4 ind 5 40568 47.31 15.70 63.28 
5 TO a = = = = 


Evaluation of Benchmark Set 1. Table 2 empirically underlines that probabilistic 
program verification can benefit from k-induction to the same extent as classical 
software verification: KIPRO2 fully automatically verifies relevant properties of 
infinite-state randomized algorithms and stochastic processes from the literature 
that require k to be strictly larger than 1. That is, proving these properties 
using (1-)inductive invariants requires either non-trivial invariant synthesis or 
additional user annotations. This indicates that k-induction mitigates the need 
for complicated specifications in probabilistic program verification (cf. [40]). 
We observe that k-induction tends to succeed if some variable is bounded 
in the candidate upper bound under consideration (cf. brp, rabin, unif_gen). 
However, k-induction can also succeed without any bounds (cf. geo). The time 
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and formulae required for checking k-inductivity increases rapidly for larger k; 
this is particularly striking for rabin and unif_gen. When refuting candidate 
bounds with BMC, we obtain a similar picture. Both the time and formulae 
required for refutation increase if the candidate bound increases (cf. brp, geo, 
rabin). 

For both k-induction and BMC, we observe a direct correlation between the 
complexity of the loop, i.e., the number of possible traces through the loop 
from some fixed initial state after some bounded number of iterations, and the 
required time and space (number of formulae). Whereas for geo and brp—which 
exhibit a rather simple structure—these checks tend to be fast, this is not the 
case for rabin and unif_gen, which have more complex loop bodies. For such 
complex loops, k-induction and BMC quickly become infeasible as k increases. 


Table 3. Empirical results for (a subset of) the ERTs [56] (time in milliseconds). 


runtime bound candidate result k #formulae_formulae_t satt totalt 
2drwalk 2-(n+1-4d) TO = = = = = 
bayesian network 5:n TO = = - = = 
ber 2-(n=2) ind 1 9 7.22 0.44 88.12 
C4B t303 0.5 - (x +2) + 0.5 - (y+ 2) ind 3 129 91.38 10.01 216.11 
condand m+n ind 1 10 7.10 0.43 76.21 
fcall 2-(n+2) ind 1 9 6.73 0.41 75.73 
hyper 5. (n= 2) ind 1 11 7.24 0.46 97.52 
linear01 0.6-a ind 1 11 7.19 0.49 74.38 
prdwalk 1.14286 - (n+ 4-2) ind 1 17 7.64 0.72 194.44 
prspeed 2-(m = y) + 0.6666667 - (n > x) ind 1 18 7.64 0.81 145.13 
race 0.666667 - (t +9 = h) ind 1 30 9.21 0.86 695.89 
rdspeed 2. (m — y) + 0.666667 - (n — x) ind 1 19 7.70 0.78 143.45 
rdwalk 2-(n+1-=2) ind 1 12 10.22 0.75 85.03 
sprdwalk 2-(n+2) ind 1 9 7.28 0.42 83.40 


Evaluation of Benchmark Set 2. From Table 3, we observe that—in almost every 
case—verification is instantaneous and requires very few formulae. The programs 
we verify are equivalent to the programs provided in [56] up to interpreting minus 
as monus and using N-typed (instead of Z) variables. A manual inspection reveals 
that this matters for C4B_t303 and rdwalk, which is the reason why the runtime 
bound for C4B_t303 is 3-inductive rather than 1-inductive. 

There are two timeouts (2drwalk, bayesian network) due to the GNF con- 
struction from Lemma 3, which exhibits a runtime exponential in the number of 
possible execution branches through the loop body. We conjecture that further 
preprocessing (by pruning infeasible branches upfront) can mitigate this, render- 
ing 2drwalk and bayesian network tractable as well. We consider a thorough 
investigation of suitable preprocessing strategies for GNF construction, which is 
outside the scope of this paper, a worthwhile direction for future research. 
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10 Conclusion 


We presented «-induction, a generalization of classical k-induction to arbitrary 
complete lattices, and—together with a complementary bounded model check- 
ing approach—obtained a fully automated technique for verifying infinite-state 
probabilistic programs. Experiments showed that this technique can prove non- 
trivial properties in an automated manner that using existing techniques cannot 
be proven—at least not without synthesizing a stronger inductive invariant. If 
a given candidate bound is k-inductive for some k, then our prototypical tool 
will find that k for linear programs and linear expectations. In theory, our tool 
is also applicable to non-linear programs at the expense of an undecidability 
quantitative entailment problem. It is left for future work to consider (positive) 
real-valued program variables for non-linear expectations. 


Acknowledgements. B. L. Kaminski thanks Larry Fischer for his linguistic advice. 
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Abstract. We investigate the problem of monitoring partially observ- 
able systems with nondeterministic and probabilistic dynamics. In such 
systems, every state may be associated with a risk, e.g., the probabil- 
ity of an imminent crash. During runtime, we obtain partial information 
about the system state in form of observations. The monitor uses this 
information to estimate the risk of the (unobservable) current system 
state. Our results are threefold. First, we show that extensions of state 
estimation approaches do not scale due the combination of nondetermin- 
ism and probabilities. While exploiting a geometric interpretation of the 
state estimates improves the practical runtime, this cannot prevent an 
exponential memory blowup. Second, we present a tractable algorithm 
based on model checking conditional reachability probabilities. Third, 
we provide prototypical implementations and manifest the applicability 
of our algorithms to a range of benchmarks. The results highlight the 
possibilities and boundaries of our novel algorithms. 


1 Introduction 


Runtime assurance is essential in the deployment of safety-critical (cyber- 
physical) systems [12,29,45,49,50]. Monitors observe system behavior and indi- 
cate when the system is at risk to violate system specifications. A critical aspect 
in developing reliable monitors is their ability to handle noisy or missing data. 
In cyber-physical systems, monitors observe the system state via sensors, i.e., 
sensors are an interface between the system and the monitor. A monitor has 
to base its decision solely on the obtained sensor output. These sensors are not 
perfect, and not every aspect of a system state can be measured. 

This paper considers a model-based approach to the construction of monitors 
for systems with imprecise sensors. Consider Fig. 1(b). We assume a model for 
the environment together with the controller. Typically, such a model contains 
both nondeterministic and probabilistic behavior, and thus describes a Markov 
decision process (MDP): In particular, the sensor is a stochastic process [56] that 
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translates the environment state into an observation. For example, this could be a 
perception module on a plane that during landing estimates the movements of an 
on-ground vehicle, as depicted in Fig. 1(a). Due to lack of precise data, the vehicle 
movements itself may be most accurately described using nondeterminism. 

We are interested in the associated state risk of the current system state. 
The state risk may encode, e.g., the probability that the plane will crash with 
the vehicle within a given number of steps, or the expected time until reaching 
the other side of the runway. The challenge is that the monitor cannot directly 
observe the current system state. Instead, the monitor must infer from a trace 
of observations the current state risk. This cannot be done perfectly as the sys- 
tem state cannot be inferred precisely. Rather, we want a sound, conservative 
estimate of the system state. More concretely, for a fixed resolution of the non- 
determinism, the trace risk is the weighted sum over the probability of being 
in a state having observed the trace, times the risk imposed by this state. The 
monitoring problem is to decide whether for any possible scheduler resolving the 
nondeterminism the trace risk of a given trace exceeds a threshold. 

Monitoring of systems that contain either only probabilistic or only nonde- 
terministic behavior is typically based on filtering. Intuitively, the monitor then 
estimates the current system states based on the model. For purely nondeter- 
ministic systems (without probabilities) a set of states needs to be tracked, and 
purely probabilistic systems (without nondeterminism) require tracking a dis- 
tribution over states. This tracking is rather efficient. For systems that contain 
both probabilistic and nondeterministic behavior, filtering is more challenging. 
In particular, we show that filtering on MDPs results in an exponential memory 
blowup as the monitor must track sets of distributions. We show that a reduc- 
tion based on the geometric interpretation of these distributions is essential for 
practical performance, but cannot avoid the worst-case exponential blowup. As a 
tractable alternative to filtering, we rephrase the monitoring problem as the com- 
putation of conditional reachability probabilities [9]. More precisely, we unroll 
and transform the given MDP, and then model check this MDP. This alternative 
approach yields a polynomial-time algorithm. Indeed, our experiments show the 
feasibility of computing the risk by computing conditional probabilities. We also 
show benchmarks on which filtering is a competitive option. 


Contribution and Outline. This paper presents the first runtime monitoring 
for systems that can be adequately abstracted by a combination of probabili- 
ties and nondeterminism and where the system state is partially observable. We 
describe the use case, show that typical filtering approaches in general fail to deal 
with this setting, and show that a tractable alternative solution exists. In Sect. 3, 
we investigate forward filtering, used to estimate the possible system states in 
partially observable settings. We show that this approach is tractable for sys- 
tems that have probabilistic or nondeterministic uncertainty, but not for systems 
that have both. To alleviate the blowup, Sect. 4 discusses an (often) efficacious 
pruning strategy and its limitations. In Sect. 5 we consider model checking as 
a more tractable alternative. This result utilizes constructions from the analysis 
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Fig. 1. A probabilistic world and sensor model represented by two MDPs for the sce- 
nario of an airplane in landing approach with on-ground vehicle movements. 


of partially observable MDPs and model checking MDPs with conditional prop- 
erties. In Sect. 6 we present baseline implementations of these algorithms, on 
top of the open-source model checker STORM, and evaluate their performance. 
The results show that the implementation allows for monitoring of a variety of 
MDPs, and reveals both strengths and weaknesses of both algorithms. We start 
with a motivating example and review related work at the end of the paper. 


Motivating Example. Consider a scenario where an autonomous airplane is 
in its final approach, i.e., lined up with a designated runway and descending for 
landing, see Fig. 1(a). On the ground, close to the runway, maintenance vehicles 
may cross the runway. The airplane tracks the movements of these vehicles and 
has to decide, depending on the movements of the vehicles, whether to abort 
the landing. To simplify matters, assume that the airplane (P) is tracking the 
movement of one vehicle (V) that is about to cross the runway. Let us further 
assume that P tracks V using a perception module that can only determine the 
position of the vehicle with a certain accuracy [33], i.e., for every position of V, 
the perception module reports a noisy variant of the position of V. However, it 
is important to know that the plane obtains a sequence of these measurements. 
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Figure 1 illustrates the dynamics of the scenario. The world model describing 
the movements of V and P is given in Fig. 1(c), where D2, Dı, and Do define 
how close P is to the runway, and R, M, and L define the position of V. Depend- 
ing on what information V perceives about P, given by the atomic proposition 
{(p) rogress}, and what commands it receives {(w) ait}, it may or may not cross 
the runway. The perception module receives the information about the state of 
the world and reports with a certain accuracy (given as a probability) the posi- 
tion of V. The (simple) model of the perception module is given in Fig. 1(d). For 
example, if P is in zone Də and V is in R then there is high chance that the per- 
ception module returns that V is on the runway. The probability of incorrectly 
detecting V’s position reduces significantly when P is in Do. 

A monitor responsible for making the decision to land or to perform a go- 
around based on the information computed by the perception module, must take 
into consideration the accuracy of this returned information. For example, if the 
sequence of sensor readings passed to the monitor is the sequence T = Ry-Ro- Mo, 
and each state is mapped to a certain risk, then how risky is it to land after 
seeing 7? If, for instance, the world is with high probability in state (M, Do), a 
very risky state, then the plane should go around. In the paper, we address the 
question of computing the risk based on this observation sequence. We will use 
this example as our running example. 


2 Monitoring with Imprecise Sensors 


In this section, we formalize the problem of monitoring with imprecise sensors 
when both the world and sensor models are given by MDPs. We start with a 
recap of MDPs, define the monitoring problem for MDPs, and finally show how 
the dynamics of the system under inspection can be modeled by an MDP defined 
by the composition of two MDPs of the sensors and world model of the system. 


2.1 Markov Decision Processes 


For a countable set X, let Distr(X) c (X — [0,1]) define the set of all distribu- 
tions over X, i.e., for d € Distr(X) it holds that Xsexd(x) = 1. For d € Distr(X), 
let the support of d be defined by supp(d) := {a | d(x) > 0}. We call a distribu- 
tion d Dirac, if |supp(d)| = 1. 


Definition 1 (Markov decision process). A Markov decision process is a 
tuple M = (S,1,Act, P,Z,obs), where S is a finite set of states, 1 € Distr(S) 
is an initial distribution, Act is a finite set of actions, P: S x Act > Distr(S') 
is a partial transition function, Z is a finite set of observations, and obs: S — 
Distr(Z) is a observation function. 


Remark 1. The observation function can also be defined as a state-action obser- 
vation function obs: S x Act — Distr(Z). MDPs with state-action observation 
function can be easily transformed into equivalent MDPs with a state observation 
function using auxiliary states [19]. Throughout the paper we use state-action 
observations to keep (sensor) models concise. 
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For a state s € S, we define AvAct(s) = {a | P(s,a) #4 L}. W.log., 
|AvAct(s)| > 1. If all distributions in M are Dirac, we refer to M as a 
Kripke structure (KS). If |AvAct(s)| = 1 for all s € S, we refer to M as 
a Markov chain (MC). When Z = S, we refer to M as fully observable and 
omit Z and obs from its definition. A finite path in an MDP M is a sequence 
T = S0051 -. -Sn E S X (Act x sS)“ such that for every 0 <i < n it holds that 
P(si,ai)(si+1) > 0 and (so) > 0. We denote the set of finite paths of M by 
Im. The length of the path is given by the number of actions along the path. 
The set IT, for some n € N denotes the set of finite paths of length n. We use 7, 
to denote the last state in 7. We omit M whenever it is clear from the context. 
A trace is a sequence of observations T = z...2n € Z*. Every path induces a 
distribution over traces. 

As standard, any nondeterminism is resolved by means of a scheduler. 


Definition 2 (Scheduler). A scheduler for an MDP M is a_ function 
o: Hm — Distr(Act) with supp(a(m)) C AvAct(m,) for every n € Hm. 


We use Sched(M) to denote the set of schedulers. For a fixed scheduler o € 
Sched(M), the probability Pr,(7) of a path m (under the scheduler ø) is the 
product of the transition probabilities in the induced Markov chain. For more 
details we refer the reader to [8]. 


2.2 Formal Problem Statement 


Our goal is to determine the risk that a system is exposed to having observed a 
trace T € Zt. Let r: S — R>o map states in M to some risk in Ryo. We call 
r a state-risk function for M. This function maps to the risk that is associated 
with being in every state. For example, in our experiments, we flexibly define the 
state risk using the (expected reward extension of the) temporal logic PCTL [8], 
to define the probability of reaching a fail state. For example, we can define risk 
as the probability to crash within H steps. The use of expected rewards allows 
for even more flexible definitions. 

Intuitively, to compute this risk of the system we need to determine the 
current system state having observed 7, considering both the probabilistic and 
nondeterministic context. To this end, we formalize the (conditional) probabil- 
ities and risks of paths and traces. Let Pro(m | T) define the probability of a 
path m, under a scheduler g, having observed 7. Since a scheduler may define 
many paths that induce the observation trace 7, we are interested in the weighted 
risk over all paths, i.e., Vreni] Pr,(a | 7)-r(a,). The monitoring problem for 
MDPs then conservatively over-approximates the risk of a trace by assuming 
an adversarial scheduler, that is, by taking the supremum risk estimate over all 
schedulers?. 


1 We later see in Lemma 8 that this is indeed a maximum. 
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The Monitoring Problem. Given an MDP M, a state-risk r: S > Rso, 
an observation trace T € Zt, and a threshold à € [0, 00), decide R,(r) > A, 


where the weighted risk function R,: Z* — R>o is defined as 
RAG) = sup Di Prol | T am 


a€Sched(M) A 
TEMA 


The conditional probability Pro(m | T) can be characterized using Bayes’ rule?: 


Pr(r | z)- Pro(m) 


Pro(a|7) = Ee) 


The probability Pr(z | 7) of a trace 7 for a fixed path 7 is obs,,(7)(T), where 


obsy(s) := obs(s), obsy(mas’) := {r - z+ obst, (zr) (7) - obs(s’)(z)}, 


when |r| = |r|, and obst,(7)(7) = 0 otherwise. The probability Pr,(7) of a trace 7 
is $, Pro(a) + Pr(z | 7). 

We call the special variant with A = 0 the qualitative monitoring problem. 
The problems are (almost) equivalent on Kripke structures, where considering a 
single path to an adequate state suffices. Details are given in [36, Appendix]. 


Lemma 1. For Kripke structures the monitoring and qualitative monitoring 
problems are logspace interreducible. 


In the next sections we present two types of algorithms for the monitoring 
problem. The first algorithm is based on the widespread (forward) filtering app- 
roach [44]. The second is new algorithm based on model checking conditional 
probabilities. While filtering approaches are efficacious in a purely nondetermin- 
istic or a purely probabilistic setting, it does not scale on models such as MDPs 
that are both probabilistic and nondeterministic. In those models, model check- 
ing provides a tractable alternative. Before going into details, we first connect 
the problem statement more formally to our motivating example. 


2.3 An MDP Defining the System Dynamics 


We show how the weighted risk for a system given by a world and sensor model 
can be formalized as a monitoring problem for MDPs. To this end, we define the 
dynamics of the world and sensors that we use as basis for our monitor as the 
following joint MDP. 

For a fully observable world MDP € = (Se, g, Acte, Pe) and a sensor MDP 
S = (Ss,ts,Se,Ps,Z,obs), where obs is state-action based, the inspected sys- 
tem is defined by an MDP [(€,S)] = ($7,147, Acte, P7,Z,obs.7) being the syn- 
chronous composition of € and S: 


2 For conciseness we assume throughout the paper that 3 =0. 
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Fig. 2. A run with its observations of the inspected system [(€,S)]] where £ and S are 
the models given in Fig. 1. 


-= Sz := Se x Ss, 
— uz is defined as u.7((u, 8)) := te(u) -eg(s) for each u € Se and s € Sg, 
— Pz: Sz x Acte — Distr( S7) oe that for all (u,s) € Sz and a € Acte; 


P7z((u,s),a) = dys € Distr(S7), 


where for all u’ € Se and s’ € Ss: dus((u’, s’)) = Pe(u,a)(u’) - Ps(s, u)(s’), 
— obs.7: Sg — Distr(Z) with obsz : lu, s) + obs(s, u). 


In Fig. 2 we illustrate a run of [(€,S)] for the world and sensor MDPs pre- 
sented in Fig. 1. We particularly show the observations of the joint MDP given 
by the distributions over the observations for each transition in the run (we omit- 
ted the probabilistic transitions for simplicity). The observations of the MDP M 
present the output of the sensor upon a path through M. These observations in 
turn are the inputs to a monitor on top of the system. The role of the monitor 
is then to compute the risk of being in a critical state based on the received 
observations. 


3 Forward Filtering for State Estimation 


We start by showing why standard forward filtering does not scale well on MDPs. 
We briefly show how filtering can be used to solve the monitoring problem for 
purely nondeterministic systems (Kripke structures) or purely probabilistic sys- 
tems (Markov Chains). Then, we show why for MDPs, the forward filtering needs 
to manage, although finite but an exponential set of distributions. In Sect. 4 we 
present a new improved variant of forward filtering for MDPs based on filtering 
with vertices of the convex hull. In Sect.5 we present a new polynomial-time 
model checking-based algorithm for solving the problem. 


3.1 State Estimators for Kripke Structures. 


For Kripke structures, we maintain a set of possible states that agree with the 
observed trace. This set of states is inductively characterized by the function 


560 S. Junges et al. 


estks: Zt — 2° which we define formally below. For an observation trace T, 
estks(T) defines the set of states that can be reached with positive probabil- 
ity. This set can be computed by a forward state traversal [31]. To illustrate 
how estks(T) is computed for 7, consider the underlying Kripke structure of 
the inspected system [(€,S)] for our running example in Fig. 1 (to make this a 
Kripke structure, we remove the probabilities). Consider further the observation 
trace T = Ro: Mo- Lo. Since [(E, S)] has only one initial state ((R, D2), sense) 
and Ro is observable with a positive probability in this state, estxs(R,) = 
{((R, D2), sense)}. As Mo is observed next, estxs(R, - Mo) computes the states 
reached from ((R, D2), sense) and where Mo can be observed with a positive 
probability, i.e., estks(Ro - Mo) = {((R, D1), sense), ((R, M1), sense) }. Finally, 
the current state having observed Ro: Mo: Lo may be one of the states estks(T) = 
{((M, Dj), sense), ((L, Dj), sense), ((L, Do), sense), ((M, Do), sense)}, which 
especially shows that we might be in the high-risk world state (M, Do). 


Definition 3 (KS state estimator). For KS = (S,1,Act, P,Z,obs), the state 
estimation function estks: Z+ — 2° is defined as 


estks(z) := {s € S | (s) > 0A obs(s)(z) > 0} 


estks(7 + 2) := {s € S | ds € estks(T), Ja € Act, P(s,aœ)(s") > 0 A obs(s’)(z) > o}. 


For a Kripke structure KS and a given trace 7, the monitoring problem can 
be solved by computing estxs(7), using [31] and Lemma 1. 


Lemma 2. For a Kripke stucture KS = (5,1, Act, P, Z, obs), a trace T € Z*, and 
a state-risk function r: S — Rso, it holds that R,(r) = max r(s). Computing 
T sEestk: 


s(T) 
R, (T) requires time O(|T|-|P|) and space O(|S]). 


A proof can be found in [36, Appendix]. The time and space requirements follow 
directly from the inductive definition of estks which resembles solving a forward 
state traversal problem in automata [31]. In particular, the algorithm allows 
updating the result after extending T in O(|P]). 


3.2 State Estimators for Markov Chains 


For Markov chains, in addition to tracking the potential reachable system states, 
we also need to take the transition probabilities into account. When a system 
is (observation-)deterministic, we can adapt the notion of beliefs, similar to 
RVSE [54], and similar to the construction of belief MDPs for partially observable 
MDPs, cf. [53]: 


Definition 4 (Belief). For an MDP M with a set of states S, a belief bel is 
a distribution in Distr(S). 


In the remainder of the paper, we will denote the function S — {0} by 0 and 
the set Distr(.S) U {0} by Bel. A state estimator based on Bel is then defined as 
follows [51,54,57]?: 


3 For the deterministic case, we omit the unique action for brevity. 
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Definition 5 (MC state estimator). For MC = (5,1, Act, P,Z,obs), a trace 
T E€ Z* the state estimation function estuc: Z™ — Bel is defined as 


{s= letersi Gl} Ises. (s) - obs(z) > 0, 


estuic(z) = ES 
0 otherwise. 
3 estmc(T)(s) - P(s, s’) - obs(s’)(z) 
estmc (T: 2) = 4s > I estuc(7)() , (È Pd) - obs(8)(z)) 


To illustrate how estyc is computed, consider again our system in Fig. 1 
and assume that the MDP has only the actions labeled with {p} (reducing it 
to the Markov chain induced by the a scheduler that only performs the {p} 
actions). Again we consider the observation trace T = Ro: Mo- Lo and compute 
estmc (T). For the first observation Ro, and since there is only one initial state, 
it follows that estyc(R.) = {(R, D2) > 1}*. From (R, D2) and having observed 
Mo we can reach the states (R, a and (M, Dı} with probabilities estmc( Ro - 


Mo) = {(R, Dı) > Ai z = 4, (M, Dı) | 2i z = &}. Finally, from 
3 2 4 

the later two states, ae observing Lo, the states (M, Do) and (L, Do) can be 

reached with probabilities estwc(Ro- Mo - Lo) = { (M, Do) — 0.0001, (L, Do) > 

0.999}. Notice that although the state (R, Do) can be reached from (R, D1), the 

probability of being in this state is 0 since the probability of observing Lo in this 


state is obs((R, Do))(L.) = 0. 


Lemma 3. For a Markov chain MC = (S,1, Act, P,Z, obs), a trace T € Z*, and 
a state-risk function r: S — R>o, tt holds that R,(T) = >> ses estuc(T)(s) -r(s). 
Computing Rr(T) can be done in time O(|r|- |S|- |P|) , and using |S| many 
rational numbers. The size of the rationals? may grow linearly in T. 


NIH 


Proof Sketch. Since the system is deterministic, there is a unique scheduler ø, 

thus R,(T) = $ enl! Pro(m | 7)-r(m,) by definition. We can show by induction 
MC 

over the length of 7 that Pr,(a | T) = estmc(7)(7,) and conclude that R,(7) = 

Vren estmc (T) (T) - r(7) = Jeg eStuc(T)(s) - r(s) because estmc(T)(s) = 0 


for all s € S for which there is no path m € a! with 7, = s. The complexity 
follows from the inductive definition of estwc that requires in each inductive step 
to iterate over all transitions of the system and maintain a belief over the states 
of the system. 


3.3 State Estimators for Markov Decision Processes 


In an MDP, we have to account for every possible resolution of nondeterminism, 
which means that a belief can evolve into a set of beliefs: 


4 We omit the (single) sensor state for conciseness. 
5 To avoid growth, one may use fixed-precision numbers that over-approximate the 
probability of being in any state—inducing a growing (but conservative) error. 
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Definition 6 (MDP state estimator). For an MDP M = (S)1,Act, 
P,Z,obs), a trace T € Zt, and a state-risk function r: S — Rso, the state 
estimation function estupp: Zt — 2®¢! is defined as 


estupp(z) = {estmc(z)}, 


estupp(T - z) = { bel’ € Bel | Jbel € estmpp (T). bel’ € estyipp (bel, z)}, 


and where bel’ € estyipp(bel, z) if there exists Sbe: S — Distr(Act) such that: 


X bel(s)- X chei(s)(a) - P(s, a, 8’) - obs(s’)(z) 


; al! a — ses a€Act i 
Vs’ .bel (s) X bel) - 3 _sva(s)(@) -aaan - obs(8)(z) 


The definition conservatively extends both Definition 3 and Definition 5. Fur- 
thermore, we remark that we do not restrict how the nondeterminism is resolved: 
any distribution over actions can be chosen, and the distributions may be dif- 
ferent for different traces. 

Consider our system in Fig. 1. For the trace r = Ro: M,- Lo, estmpp(T) is 
computed as follows. First, when observing Ro, the state estimator computes 
the initial belief set estmupp (Ro) = {{(R, Dz) > 1}}. From this set of beliefs, 
when observing Mo, a set estmpp (Ro - Mo) can be computed since all transi- 
tions 0, {p}, {w}, {p, w} (as well as their convex combinations) are possible from 
(R, Dz). One of these beliefs is for example {(R, D1) > 5, (M, Dı) + 3} when 
a scheduler takes the transition {p} (as was computed in our example for the 
Markov chain case). Having additionally observed L, a new set estmpp (Ro Mo Lo) 
of beliefs can be computed based on the beliefs in estypp(R.M,). For exam- 
ple from the belief {(R, Dı) + 7,(M,D1) + $}, two of the new beliefs 
are {(L, Do) ++ 0.999, (M, Do) + 0.0001} and {(M, D1) > 0.0287, (M, Do) > 
0.0001, (L, Do) — 0.9712}. The first belief is reached by a scheduler that takes 
a transition {p} at both (R, Dı) and (M, Dı). Notice that the belief does not 
give a positive probability to the state (R, Dp) because Lo cannot be observed 
in this state. The second belief is reached by considering a scheduler that takes 
transition {p} at (M, Dı) and transition @ at (R, D1). 


Theorem 1. For an MDP M = _  (S,1,Act,P,Z,obs), a trace T € 
Zt, and a state-risk function r: S —> R>o, it holds that R,(r) = 


SUPbel€estmpp (7) J aes bel(s) i r(s). 


Proof Sketch. For a given trace 7, each (history-dependent, randomizing) sched- 
uler induces a belief over the states of the Markov chain induced by the scheduler. 
Also, each belief in estmpp (T) corresponds to a fixed scheduler, namely that one 
used to compute the belief recursively (i.e., an arbitrary randomizing memory- 
less scheduler for every time step). Once a scheduler o and its corresponding 
belief bel is fixed, or vice versa, we can show using induction over the length of 


T that Drenk) Pro(m | 7)-r(m) = X ses bel(s) - r(s). 
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Fig. 3. Beliefs in R” on M for T = 2020, 202020 and 2020721, respectively. 


4 Convex Hull-Based Forward Filtering 


In this section, we show that we can use a finite representation for estwpp(7), 
but that this representation is exponentially large for some MDPs. 


4.1 Properties of estwpp(T). 


First, observe that O never maximizes the risk. Furthermore, O is closed 
under updates, i.e., estypp(0,z) = {0}. We can thus w.l.o.g. assume that 
0 ¢ estmpp (T). Second, observe that estypp(T) Æ 0 if Pro(T) > 0. 

We can interpret a belief bel € Bel as point in (a bounded subset of) R{S!-)), 
We are in particular interested in convex sets of beliefs. A set B C Bel is convex if 
the convex hull CH(B) of B, i.e. all convex combination of beliefs in B®, coincides 
with B, i.e., CH(B) = B. For a set B C Bel, a belief bel € B is an interior belief 
if it can be expressed as convex combination of the beliefs in B \ {bel}. All other 
beliefs are (extremal) points or vertices. Let the set V(B) C B denote the set of 
vertices of the convex hull of B. 


Example 1. Consider Fig.3(a). All observation are Dirac, and only states s2 
and s4 have observation 21. The beliefs having observed zozo are distributions 
over 81,83, and can thus be depicted in a one-dimensional simplex. In particu- 
lar, we have V(estupp(2020)) = {{51 > 1}, {51 3/4, 53 + 1/4}}, as depicted in 
Fig. 3(b). The six beliefs having observed 292929 are distributions over So, 51, 83, 
depicted in Fig. 3(c). Five out of six beliefs are vertices. The belief having 
observed 292021 is in Fig. 3(d). 


Remark 2. Observe that we illustrate the beliefs over only the states estxs(T). 
We therefore call |estks(7)| the dimension of estmpp (T). 


From the fundamental theorem of linear programming [47, Ch. 7] it immediately 
follows that the trace risk R, is obtained at a vertex of the beliefs of estyppr. 
We obtain the following refinement over Theorem 1: 


ê That is, CH(B) = {X pac g w(bel) - bel | for all w € RẸ, with X w(bel) = 1}. 
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Theorem 2. For every T andr: R(T) = max Vises bel(s) < r(s). 
bel€V(estmpp(T)) 
Lemma 5 below clarifies that this maximum indeed exists. 

We make some observations that allow us to compute the vertices more effi- 
ciently: Let estypp(B, z) denote Upee g eStupp(bel, z). From the properties of 
convex sets [18, Ch. 2], we make the following observations: If B is convex, 
estyipp(B, z) is convex, as all operations in computing a new belief are convex- 
set preserving’. Furthermore, if B has a finite set of vertices, then estyfyp(B, 2) 
has a finite set of vertices. The following lemma which is based on the observa- 
tions above clarifies how to compute the vertices: 


Lemma 4. For a convex set of beliefs B with a finite set of vertices and an 
observation z: 


V(estmpp (B, z)) = V(estmpp (V (B), 2)). 
By induction and using the facts above we obtain: 
Lemma 5. Any V(estmpp(T)) is finite. 


A monitor thus only needs to track the vertices. Furthermore, estypp(B, z) can 
be adapted to compute only vertices by limiting Sbe, to S — Act. 


4.2 Exponential Lower Bounds on the Relevant Vertices 


We show that a monitor in general cannot avoid an exponential blow-up in the 
beliefs it tracks. First observe that updating bel yields up to [], |Act(s)| new 
beliefs (vertex or not), a prohibitively large number. The number of vertices is 
also exponential: 


Lemma 6. There exists a family of MDPs M, with 2n +1 states such that 
|V(estupp(T))| = 2” for every T with |r| > 2. 


Proof Sketch. We construct Mn with n = 3, that is, Ms in Fig. 4(a). For this 
MDP and 7 = AAA, |V(estypp(T))| = 23. In particular, observe how the belief 
factorizes into a belief within each component C; = {h;,1;} and notice that Mn 
has components C: to Cn. In particular, for each component, the belief being that 
we are with probability mass 1/n (for n = 3, 1/3) in the low’ state l; or the ‘high’ 
state h;. We depict the beliefs in Fig. 4(b,c,d). Thus, for any 7 with |r| > 2 we can 
compactly represent V(estypp(T)) as bit-strings of length n. Concretely, the belief 


{hy, lo, 13 b> 1/3, l1, ho, hg b> 0} maps to 100, and 
{h1, l2, h3 + 1/3, l1, h2, l3 +> 0} maps to 101. 


These are exponentially many beliefs for bit strings of length n. 
One might ask whether a symbolic encoding of an exponentially large set 
may result in a more tractable approach to filtering. While Theorem 2 allows 


T The scaling is called a projection. 
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(a) M3. Observations Z = Act with obs(s, a) = a if a # B and obs(s, B) = 
A for every s. Initial belief is A, all probabilities are 1, unless stated 


otherwise. 
(1/3, 0) (0, 1/3) (1/3, 0) (0, 1/3) (1/3, 0) (0, 1/3) 
Sealed es NE PRT aa © ON NE oe 
ly hy l2 ha l3 hs 


(b) Beliefs after AA 


(1/3, 0) à (0743) X (1/3, 0) R (0, 1/3) X (1/3, 0) = (0, 1/3) 


(1/3, 0) (0, 1/3) (1/3, 0) (0, 1/3) (1/3, 0) (0, 1/3) 
ee ee a es Sern , ea E o rhea o] 
li hy lg he l3 h3 


(d) Beliefs after AAAAt 


Fig. 4. Construction for the correctness of Lemma 6. 


to compute the associated risk from a set of linear constraints with standard 
techniques, it is not clear whether the concise set of constraints can be efficiently 
constructed and updated in every step. We leave this concern for future work. 

In the remainder we investigate whether we need to track all these beliefs. 
First, when the monitor is unaware of the state-risk, this is trivially unavoid- 
able. More precisely, all vertices may induce the maximal weighted trace risk by 
choosing an appropriate state-risk: 


Lemma 7. For every T and every bel € V(estmpp(T)) there exists an r s.t. 
bel'(s) - r(s) with max = —oo. 


bel(s) - r(s) > max 
2 ( ) ( ) z pet ev ene toe 2 beled 


Proof Sketch. We construct r such that r(s) > r(s’) if bel(s) > bel(s’). 

Second, even if the monitor is aware of the state risk r, it may not be able to 
prune enough vertices to avoid exponential growth. The crux here is that while 
some of the current beliefs may induce a smaller risk, an extension of the trace 
may cause the belief to evolve into a belief that induces the maximal risk. 


Theorem 3. There exist MDPs M,, aT with B := V(estwpp(T)) and a state- 


risk r such that |B| = 2” and for all bel € B exists r’ € Z+ with R,(r-7') > 
SUPpele p’ X, bel(s) - r(s), where B’ = estyipnp(B \ {bel}, 7’). 
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It is helpful to understand this theorem as describing the outcome of a game 
between monitor and environment: The statement says if the monitor decides to 
drop some vertices from estmppT, the environment may produce an observation 
trace 7’ that will lead the monitor to underestimate the weighted risk at R,(7-7’). 


Proof Sketch. We extend the construction of Fig. 4(a) with choices to go to a 
final state. The full proof sketch can be found in [36, Appendix]. 


4.3 Approximation by Pruning 


Finally, we illustrate that we cannot simply prune small probabilities from 
beliefs. This indicates that an approximative version of filtering for the mon- 
itoring problem is nontrivial. Reconsider observing zozo in the MDP of Fig. 3, 
and, for the sake of argument, let us prune the (small) entry s3 + 1/4 to 0. Now, 
continuing with the trace z9z92,, we would update the beliefs from before and 
then conclude that this trace cannot be observed with positive probability. With 
pruning, there is no upper bound on the difference between the computed R- 
and the actual R,. Thus, forward filtering is, in general, not tractable on MDPs. 


5 Unrolling with Model Checking 


We present a tractable algorithm for the monitoring problem. Contrary to filter- 
ing, this method incorporates the state risk. We briefly consider the qualitative 
case. An algorithm that solves that problem iteratively guesses a successor such 
that the given trace has positive probability, and reaches a state with sufficient 
risk. The algorithm only stores the current and next state and a counter. 


Theorem 4. The Monitoring Problem with A = 0 is in NLOGSPACE. 


This result implies the existence of a polynomial time algorithm, e.g., using a 
graph-search on a graph growing in |r|. There also is a deterministic algorithm 
with space complexity O(log?(|M|+|r7|)), which follows from applying Savitch’s 
Theorem [46] , but that algorithm has exponential time complexity. 

We now present a tractable algorithm for the quantitative case, where we 
need to store all paths. We do this efficiently by storing an unrolled MDP with 
these paths using ideas from [9,19]. In particular, on this MDP, we can effi- 
ciently obtain the scheduler that optimizes the risk by model checking rather 
than enumerating over all schedulers explicitly. We give the result before going 
into details. 


Theorem 5. The Monitoring Problem (with A > 0) is P-complete. 


The problem is P-hard, as unary-encoded step-bounded reachability is P-hard [41]. 
It remains to show a P-time algorithm®, which is outlined below. Roughly, the algo- 
rithm constructs an MDP M” from M in three conceptual steps, such that the 


8 On first sight, this might be surprising as step-bounded reachability in MDPs is 
PSPACE-hard and only quasi-polynomial. However, our problem gets a trace and 
therefore (assuming that the trace is not compressed) can be handled in time polyno- 
mial in the length of the trace. 
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(a) M (b) M’ 


Fig. 5. Polynomial-time algorithm for solving Problem 1 illustrated. 


maximal probability of reaching a state in M” coincides with the R,(r). The for- 
mer can be solved by linear programming in polynomial time. The downside is that 
even in the best case, the memory consumption grows linearly in |r]. 

We outline the main steps of the algorithm and exemplify them below: First, 
we transform M into an MDP M’ with deterministic state observations, i.e., 
with obs’: S — Z. This construction is detailed in [19, Remark 1], and runs in 
polynomial time. The new initial distribution takes into account the initial obser- 
vation and the initial distribution. Importantly, for each path 7 and each trace 7, 
obs;,(7)(7) is preserved. From here, the idea for the algorithm is a tailored adap- 
tion of the construction for conditional reachability probabilities in [9]. We ensure 
that r(s) € [0,1] by scaling r and A accordingly. Now, we construct a new MDP 
M” = (S" 0", Act”, P”) with state space S” := (S’x{0,...,|7|-1})U{1, T} and 
an n-times unrolled transition relation. Furthermore, from the states (s,|7|—1), 
there is a single outgoing action that with probability r(s) leads to T and with 
probability 1 — r(s) leads to L. Observe that the risk is now the supremum 
of conditioned reachability probabilities over paths that reach T, conditioned 
by the trace r. The MDP M” is only polynomially larger. Then, we construct 
MDP M” by copying M” and replacing (part of) the transition relation P” 
by P” such that paths m with 7 ¢ obs,,(7) are looped back to the initial state 
(resembling rejection sampling). Formally, 


P""((s,4),a) = oe if obs (5) = fi, 

L otherwise. 
The maximal conditional reachability probability in M” is the maximal reacha- 
bility probability in M” [9]. Maximal reachability probabilities can be computed 
by solving a linear program [43], and can thus be computed in polynomial time. 


Example 2. We illustrate the construction in Fig. 5. In Fig. 5(a), we depict an 
MDP M, with = {s9, 5, + 1/2}. Furthermore, let T = zozo and let r(s9) = 1 
and r(s1) = 2. Let obs(so) = {zo > 1} and obs(s1) = {z0 > V4, 2) 3/4}. 
State sı has two possible observations, so we split sı into sı and sg in MDP 
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M’, each with their own observations. Any transition into sı is now split. As 
|r| = 2, we unroll the MDP M’ into MDP M” to represent two steps, and 
add goal and sink states. After rescaling, we obtain that r(so) = 1/2, whereas 
r(s1) = r(s2) = 2/2 = 1, and we add the appropriate outgoing transitions to 
the states sł. In a final step, we create MDP M” from M”: we reroute all 
probability mass that does not agree with the observations to the initial states. 
Now, R,(2020) is given by the probability to reach, in M”, in an unbounded 
number of steps, T. 


The construction also implies that maximizing over a finite set of schedulers, 
namely the deterministic schedulers with a counter from 0 to |r|, suffices. We 
denote this class Xpc(|r|). Formally, a scheduler is in Xpc(k) if for all r, 7’: 


G =m A (|r| = [r'| v (a| > kA lr'| > k))) implies o(7) = o(7’). 


Lemma 8. For every T, it holds that 


R) = | max So Prola |7): rlr). 
TE 


The crucial idea underpinning this lemma is that memoryless schedulers suffice 
for the unrolling, and that the states of the unrolling can be uniquely mapped to 
a state and the length of the history for every m through M. By reducing step- 
bounded reachability we can also show that this set of schedulers is necessary [4]. 


6 Empirical Evaluation 


Implementation. We provide prototype implementations for both filtering- and 
model-checking-based approaches from Sect. 3, built on top of the probabilistic 
model checker STORM [30]. We provide a schematic setup of our implementation 
in Fig. 6. As input, we consider a symbolic description of MDPs with state- 
based observation labels, based on an extended dialect of the Prism language. 
We define the state risk in this MDP via a temporal property (given as a PCTL 
formula), and obtain the concrete state-risk by model checking. We take a seed 
that yields a trace using the simulator. For the experiments, actions are resolved 
uniformly in this simulator’. The simulator iteratively feeds observations into 
the monitor, running either of our two algorithms (implemented in C++). After 
each observation z;, the monitor computes the risk R; having observed zg... zi. 
We flexibly combine these components via a Python API!°. 

For filtering as in Sect. 4, we provide a sparse data structure for beliefs that is 
updated using only deterministic schedulers. This is sufficient, see Lemma 4. To 
further prune the set of beliefs, we implement an SMT-driven elimination [48] 


? This is not an assumption but rather our evaluation strategy. 
10 Available at https://github.com/monitoring- MDPs/premise. 
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iterative process 


input 
State risk : | Model : r: S => Rso <internal memory> after zo... Zi—1 
: a = SS a Saas 
: temporal spec : |checker| : ye z 1 
: i Forw. Filter| uses Convex | 
MDP ' Sec. 4 Hull |: isk 
i > i.e 
(Prism lang.) : - System obs 2 ı Unrolling uses 7 Model | : A 
a Sa = i i 
~~~ Simulator f Sec. 5 checking | 
Seed 


Monitor 


Fig. 6. Schematic setup for prototype mapping stream zo... 2, to stream Ro... Rp. 


of interior beliefs, inside of the convex hull'!. We construct the unrolling as 
described in Sect. 5 and apply model checking via any sparse engines in STORM. 


Reproducibility. We archived a container with sources, benchmarks, and scripts 
to reproduce our experiments: https: //doi.org/10.5281/zenodo.4724622. 


Set-Up. For each benchmark described below, we sampled 50 random traces 
using seeds 0-49 of lengths up to |r| = 500. We are interested in the prompt- 
ness, that is, the delay of time between getting an observation z; and returning 
corresponding risk r;, as well as the cumulative performance obtained by sum- 
ming over the promptness along the trace. We use a timeout of 1 second for 
this query. We compare the forward filtering (FF) approach with and without 
convex hull (CH) reduction, and the model unrolling approach (UNR) with two 
model checking engines of STORM: exact policy iteration (EPI, [43]) and opti- 
mistic value iteration (OVI, [28]). All experiments are run on a MacBook Pro 
MV962LL/A, using a single core. The memory limit of 6GB was not violated. 
We use Z3 [38] as SMT-solver [11] for the convex hull reduction. 


Benchmarks. We present three benchmark families, all MDPs with a combination 
of probabilities, nondeterminism and partial observability. 


AIRPORT-A is as in Sect. 1, but with a higher resolution for both ground vehicle 
in the middle lane and the plane. AIRPORT-B has a two-state sensor model with 
stochastic transitions between them. 


REFUEL-A models robots with a depleting battery and recharging stations. The 
world model consists of a robot moving around in a Dx D grid with some ded- 
icated charging cells, where each action costs energy. The risk is to deplete the 
battery within a fixed horizon. REFUEL-B is a two-state sensor variant. 


EVADE-I is inspired by a navigation task in a multi-agent setting in a Dx D grid. 
The monitored robot moves randomly, and the risk is defined as the probability 
of crashing with the other robot. The other robot has an internal incentive in 
the form of a cardinal direction, and nondeterministically decides to move or 


11 Advanced algorithms like Quickhull [10] are not without significant adaptions appli- 
cable as the set of beliefs can be degenerate (roughly, a set without full rank). 


570 S. Junges et al. 


Table 1. Performance for promptness of online monitoring on various benchmarks. 


CH Forward Filtering Unrolling 
Id Name Inst IS] PI WIIN T T BBD DIN T T [Sul |S 


avg max avg max avg max avg max avg max 
100/50 0.01 0.01 45 746 7/50 0.04 0.11 524 599 
500/50 0.01 0.01 1.0 1 1.0 1/50 0.01 0.01 1075 1258 


1 AIRPORT-A 7,50,30) 20910 114143 


1 3 .16 556 629 

2 AIRPORT-B 3,50,30/ 20232 106012 alte) OU Dane Oe 20e 8ee 
500| 0 50 0.01 0.01 1460 1647 

1 0.3: P 

3 AIRPORT-B 7,50,30| 41820 308474 100.0 Be ee 1000 eee 
500| 0 11 0.02 0.02 2097 2297 


100 50 0.01 0.01 2.2 
500/50 0.01 0.01 1.5 
100/50 0.06 0.23 4.2 
500/50 0.01 0.01 2.9 
100 50 0.01 0.02 2.6 10 3.3 


2.8 5/50 0.01 0.05 325 409 
1.7 5/50 0.01 0.19 1071 2409 
5.6 10/50 0.04 0.17 608 732 
3.3 10/46 0.04 0.09 2171 4688 
49 0.01 0.06 332 363 


4 REFUEL-A 12,50 45073 2431691 


5 REFUEL-B 12,50 90145 9725277 


œ œ| e 


6 EVADE-I 15 377101 2022295 ji 
500/50 0.01 0.01 2.4 5 3.4 4/45 0.08 0.90 1655 1891 
7 EVADE-V 53 1001 5318 00/26 0.01 0.01 1.0 1 1.0 1/50 0.00 0.02 134 241 
500/25 0.01 0.01 10 1 1.0 1/50 0.00 0.01 538 671 
8 EVADE-V 63 2161 11817 100; 1 0.01 0.01 10 1 1.0 1/50 0.02 0.32 319 861 
500) 1 0.01 0.01 1.0 1 1.0 1/49 0.01 0.02 777 1484 


to uniformly randomly change its incentive. The monitor observes everything 
except the incentive of the other robot. EVADE-V is an alternative navigation 
task: Contrary to above, the other robot does not have an internal state and 
indeed navigates nondeterministically in one of the cardinal directions. We only 
observe the other robot location is within the view range. 


Results. We split our results in two tables. In Table1, we give an ID for every 
benchmark name and instance, along with the size of the MDP (nr. of states 
|S| and transitions |P|) our algorithms operate on. We consider the promptness 
after prefixes of length |r|. In particular, for forward filtering with the convex 
hull optimization, we give the number N of traces that did not time out before, 
and consider the average Tay, and maximal time Tmax needed (over all sampled 
traces that did not time-out before). Furthermore, we give the average, Bavg, 
and maximal, Bmax, number of beliefs stored (after reduction), and the average, 
Dayg, and maximal, Dmax, dimension of the belief support. Likewise, for unrolling 
with exact model checking, we give the number N of traces that did not time 
out before, and we consider average Tavg and maximal time Tmax, as well as the 
average size and maximal number of states of the unfolded MDP. 

In Table2, we consider for the benchmarks above the cumulative perfor- 
mance. In particular, this table also considers an alternative implementation for 
both FF and UNR. We use the IDs to identify the instance, and sum for each 
prefix of length |r| the time. For filtering, we recall the number of traces N that 
did not time out, the average and maximal cumulative time along the trace, 
the average cumulative number of beliefs that were considered, and the average 
cumulative number of beliefs eliminated. For the case without convex hull, we 
do not eliminate any vertices. For unrolling, we report average Tavg and maxi- 
mal cumulative time using EPI, as well as the time required for model building, 
Bld” (relative to the total time, per trace). We compare this to the average 
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Table 2. Summarized performance for online monitoring 


FF w/o CH FF w/ CH UNR (EPI) UNR (OVI) 
Id |r| NT T BIN T T B E|N T T Bld* Bld* NT T 


avg max avg avg max avg avg avg max avg max avg max 


1 100; 0 50 0.9 1.1 493 241/50 2.9 3.6 6 56/50 0.0 0.1 
500 | 0 50 3.7 4.3 1040 316/50 7.5 10.7 21 24 50 0.4 0.8 
2 100) 0 0 50 37 47 6 54/50 0.1 0.1 
500) 0 0 50 11.9 17.1 18 23/50 0.6 0.8 
3 100) 0 0 50 7.6 10.6 5 55/50 0.1 0.2 
500| 0 0 11 21.3 28.7 19 23 50 0.9 1.7 
4 100) 109 09 1473/50 0.7 0.8 241 138/50 0.7 1.0 35 69 50 0.0 0.1 
500) 1 0.9 0.9 1873/50 34 3.7 868 226/50 5.6 21.2 57 67 50 0.5 0.9 
5 100 0 50 7.4 10.7 442 2267|50 2.5 4.4 32 57/50 0.1 0.2 
500) 0 50 16.5 42.2 1781 4249|46 19.5 64.2 55 70/50 1.3 2.3 
6 100/13 0.7 2.9 2055 50 1.1 48 273 160/49 0.5 2.0 34 65/47 0.0 0.1 
500| 2 4.4 6.8 20524|50 5.1 11.5 1237 632|45 22.4 53.6 13 29/43 0.5 0.7 
7 100 13 0.1 0.5 274 26 0.8 1.2 106 11/50 0.4 1.0 19 45/48 0.0 0.1 
500/13 0.1 0.5 674/25 3.7 4.2 505 7|50 1.3 44 46 58/47 0.2 0.3 
8 100 0 1 1.3 1.3 124 109/50 1.5 7.0 15 39 36 0.4 5.6 
500) 0 1 43 43 524 109/49 4.9 28.1 37 56 35 0.7 6.4 


and maximal cumulative time for using OVI (notice that building times remain 
approximately the same). 


Discussion. The results from our prototype show that conservative (sound) pre- 
dictive modeling of systems that combine probabilities, nondeterminism and 
partial observability is within reach with the methods we proposed and state- 
of-the-art algorithms. Both forward filtering and an unrolling-based approaches 
have their merits. The practical results thus slightly diverge from the complexity 
results in Sect. 3.1, due to structural properties of some benchmarks. In par- 
ticular, for AIRPORT-A and REFUEL-A, the nondeterminism barely influences 
the belief, and so there is no explosion, and consequentially the dimension of 
the belief is sufficiently small that the convex hull can be efficiently computed. 
Rather than the number of states, this belief dimension makes EVADE-V a dif- 
ficult benchmark!?. If many states can be reached with a particular trace, and 
if along these paths there are some probabilistic states, forward filtering suffers 
significantly. We see that if the benchmark allows for efficacious forward filter- 
ing, it is not slowed down in the way that unrolling is slower on longer traces. 
For UNR, we observe that OVI is typically the fastest, but EPI does not suffer 
from the numerical worst-cases as OVI does. If an observation trace is unlikely, 
the unrolled MDP constitutes a numerically challenging problem, in particular 
for value-iteration based model checkers, see [27]. For FF, the convex hull com- 
putation is essential for any dimension, and eliminating some vertices in every 
step keeps the number of belief states manageable. 


12 The max dimension =1 in EVADE-V is only over the traces that did not time-out. 
The dimension when running in time-outs is above 5. 
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7 Related Work 


We are not the first to consider model-based runtime verification in the presence 
of partial observability and probabilities. Runtime verification with state estima- 
tion on hidden Markov models (HMM)—without nondeterminism has been stud- 
ied for various types of properties [51,54,57] and has been extended to hybrid 
systems [52]. The tool Prevent focusses on black-box systems by learning an 
HMM from a set of traces. The HMM approximates (with only convergence-in- 
the-limit guarantees) the actual system [6], and then estimates during runtime 
the most likely trace rather than estimating a distribution over current states. 
Extensions consider symmetry reductions on the models [7]. These techniques 
do not make a conservative (sound) risk estimation. The recent framework for 
runtime verification in the presence of partial observability [23] takes a more 
strict black-box view and cannot provide state estimates. Finally, [26] chooses to 
have partial observability to make monitoring of software systems more efficient, 
and [58] monitors a noisy sensor to reduce energy consumption. 

State beliefs are studied when verifying HMMs [59], where the question 
whether a sequence of observations likely occurs, or which HMM is an adequate 
representation of a system [37]. State beliefs are prominent in the verification of 
partially observable MDPs [16,32,40], where one can observe the actions taken 
(but the problem itself is to find the right scheduler). Our monitoring problem 
can be phrased as a special case of verification of partially observable stochastic 
games [20], but automatic techniques for those very general models are lack- 
ing. Likewise, the idea of shielding (pre)computes all action choices that lead 
to safe behavior [3,5,15,24,34,35]. For partially observable settings, shielding 
again requires to compute partial-information schedulers [21,39], contrary to 
our approach. Partial observability has also been studied in the context of diag- 
nosability, studying if a fault has occurred (in the past) [14], or what actions 
uncover faults [13]. We, instead assume partial observability in which we do 
detect faults, but want to estimate the risk that these faults occur in the future. 

The assurance framework for reinforcement learning [42] implicitly allows 
for stochastic behavior, but cannot cope with partial observability or nondeter- 
minism. Predictive monitoring has been combined with deep learning [17] and 
Bayesian inference [22], where the key problem is that the computation of an 
imminent failure is too expensive to be done exactly. More generally, learning 
automata models has been motivated with runtime assurance [1,55]. Testing 
approaches statistically evaluate whether traces are likely to be produced by a 
given model [25]. The approach in [2] studies stochastic black-box systems with 
controllable nondeterminism and iteratively learns a model for the system. 


8 Conclusion 


We have presented the first framework for monitoring based on a trace of obser- 
vations on models that combine nondeterminism and probabilities. Future work 
includes heuristics for approximate monitoring and for faster convex hull com- 
putations, and to apply this work to gray-box (learned) models. 
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Abstract. We revisit the symbolic verification of Markov chains with 
respect to finite horizon reachability properties. The prevalent approach 
iteratively computes step-bounded state reachability probabilities. By 
contrast, recent advances in probabilistic inference suggest symbolically 
representing all horizon-length paths through the Markov chain. We ask 
whether this perspective advances the state-of-the-art in probabilistic 
model checking. First, we formally describe both approaches in order 
to highlight their key differences. Then, using these insights we develop 
RUBICON, a tool that transpiles PRISM models to the probabilistic infer- 
ence tool Dice. Finally, we demonstrate better scalability compared to 
probabilistic model checkers on selected benchmarks. All together, our 
results suggest that probabilistic inference is a valuable addition to the 
probabilistic model checking portfolio, with RUBICON as a first step 
towards integrating both perspectives. 


1 Introduction 


Systems with probabilistic uncertainty are ubiquitous, e.g., probabilistic pro- 
grams, distributed systems, fault trees, and biological models. Markov chains 
replace nondeterminism in transition systems with probabilistic uncertainty, and 
probabilistic model checking [4,7] provides model checking algorithms. A key 
property that probabilistic model checkers answer is: What is the (precise) prob- 
ability that a target state is reached (within a finite number of steps h)? Contrary 
to classical qualitative model checking and approximate variants of probabilistic 
model checking, precise probabilistic model checking must find the total proba- 
bility of all paths from the initial state to any target state. 
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Bi += pı a= Pn = Dn n := 
a | EEO" 


(a) Motivating factory Markov chain with s; = [ci = 0], t: = [c = 1]. 


const double pi, P2, P3» qi» Q2» 433 

module F1 v 1,500 
cı : bool init false; S 

[a] !cı ->pi: (e,=1) +1—pı: (ci=0); go 
[a] cı ->qi: (cj=0) +1—qı: (cj=1); q 500 
endmodule Ea o 


module F2 = F1[c1=c2 ,p1=p2 ,q1=q2] 
module F3 = F1[c1=c3 ,pı=p3,q1=q3] 
label "allStrike" = cy & co & c3; 


# Parallel Chains 
(b) A PRISM model of (a) with 3 factories. (c) Relative scaling. (d) BDD 


Fig. 1. Motivating example. Figure 1(c) compares the performance of RUBICON (—*—), 
STORM’s explicit engine (-o—), STORM’s symbolic engine ( ) and Prism (——) 
when invoked on a (b) with arbitrarily fixed (different) constants for pi, qi and horizon 
h = 10. Times are in seconds, with a time-out of 30 min. 


Nevertheless, the prevalent ideas in probabilistic model checking are gener- 
alizations of qualitative model checking. Whereas qualitative model checking 
tracks the states that can reach a target state (or dually, that can be reached 
from an initial state), probabilistic model checking tracks the i-step reachability 
probability for each state in the chain. The i+1-step reachability can then be 
computed via multiplication with the transition matrix. The scalability concern 
is that this matrix grows with the state space in the Markov chain. Mature model 
checking tools such as STORM [36], Modest [34], and PRISM [51] utilize a variety 
of methods to alleviate the state space explosion. Nevertheless various natural 
models cannot be analyzed by the available techniques. 

In parallel, within the AI community a different approach to representing a 
distribution has emerged, which on first glance can seem unintuitive. Rather than 
marginalizing out the paths and tracking reachability probabilities per state, the 
probabilistic AI community commonly aggregates all paths that reach the target 
state. At its core, inference is then a weighted sum over all these paths [16]. 
This hinges on the observation that this set of paths can often be stored more 
compactly, and that the probability of two paths that share the same prefix or 
suffix can be efficiently computed on this concise representation. This inference 
technique has been used in a variety of domains in the artificial intelligence 
(AI) and verification communities [9,14,27,39], but is not part of any mature 
probabilistic model checking tools. 

This paper theoretically and experimentally compares and contrasts these 
two approaches. In particular, we describe and motivate RUBICON, a probabilistic 
model checker that leverages the successful probabilistic inference techniques. We 
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begin with an example that explains the core ideas of RUBICON followed by the 
paper structure and key contributions. 


Motivating Example. Consider the example illustrated in Fig. 1(a). Suppose 
there are n factories. Each day, the workers at each factory collectively decide 
whether or not to strike. To simplify, we model each factory (i) with two states, 
striking (t;) and not striking (s;). Furthermore, since no two factories are identi- 
cal, we take the probability to begin striking (p;) and to stop striking (q;) to be 
different for each factory. Assuming that each factory transitions synchronously 
and in parallel with the others, we query: “what is the probability that all the 
factories are simultaneously striking within h days?” 

Despite its simplicity, we observe that state-of-the-art model checkers like 
STORM and PRISM do not scale beyond 15 factories.1 For example, Fig. 1(b) 
provides a PRISM encoding for this simple model (we show the instance with 
3 factories), where a Boolean variable c; is used to encode the state of each 
factory. The “allStrike” label identifies the target state. Figure 1(c) shows the 
run time for an increasing number of factories. While all methods eventually 
time out, RUBICON scales to systems with an order of magnitude more states. 


Why is This Problem Hard? To understand the issue with scalability, observe 
that tools such as STORM and PRISM store the transition matrix, either explicitly 
or symbolically using algebraic decision diagrams (ADDs). Every distinct entry 
of this transition matrix needs to be represented; in the case of ADDs using a 
unique leaf node. Because each factory in our example has a different probability 
of going on strike, that means each subset of factories will likely have a unique 
probability of jointly going on strike. Hence, the transition matrix then will 
have a number of distinct probabilities that is exponential in the number of 
factories, and its representation as an ADD must blow up in size. Concretely, 
for 10 factories, the size of the ADD representing the transition matrix has 1.9 
million nodes. Moreover, the explicit engine fails due to the dense nature of the 
underlying transition matrix. We discuss this method in Sect. 3. 


How to Overcome This Limitation? This problematic combinatorial explosion 
is often unnecessary. For the sake of intuition, consider the simple case where 
the horizon is 1. Still, the standard transition matrix representations blow up 
exponentially with the number of factories n. Yet, the probability of reaching 
the “allStrike” state is easy to compute, even when n grows: it is pj -po-++ Dn. 

RUBICON aims to compute probabilities in this compact factorized way by 
representing the computation as a binary decision diagram (BDD). Figure 1(d) 
gives an example of such a BDD, for three factories and a horizon of one. A key 
property of this BDD, elaborated in Sect. 3, is that it can be interpreted as a 
parametric Markov chain, where the weight of each edge corresponds with the 
probability of a particular factory striking. Then, the probability that the goal 
state is reached is given by the weighted sum of paths terminating in T: for this 
instance, there is a single such path with weight pı -p2-p3. These BDDs are tree- 
like Markov-chains, so model checking can be performed in time linear in the size 


1 Section 6 describes the experimental apparatus and our choice of comparisons. 
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of the BDD using dynamic programming. Essentially, the BDD represents the 
set of paths that reach a target state—an idea common in probabilistic inference. 

To construct this BDD, we propose to encode our reachability query sym- 
bolically as a weighted model counting (WMC) query on a logical formula. By 
compiling that formula into a BDD, we obtain a diagram where computing the 
query probability can be done efficiently (in the size of the BDD). Concretely 
for Fig. 1(d), the BDD represents the formula ef) A of) ^ of), which encodes 
all paths through the chain that terminate in the goal state (all factories strike 
on day 1). For this example and this horizon, this is a single path. WMC is a 
well-known strategy for probabilistic inference and is currently the among the 
state-of-the-art approaches for discrete graphical models [16], discrete probabilis- 
tic programs [39], and probabilistic logic programs [27]. 

In general, the exponential growth of the number of paths might seem like 
it dooms this approach: for n = 3 factories and horizon h = 1, we need to 
only represent 8 paths, but for h = 2, we would need to consider 64 different 
paths, and so on. However, a key insight is that, for many systems — such as 
the factory example — the structural compression of BDDs allows a concise rep- 
resentation of exponentially many paths, all while being parametric over path 
probabilities (see Sect. 4). To see why, observe that in the above discussion, the 
state of each factory is independent of the other factories: independence, and 
its natural generalizations like conditional and contextual independence, are the 
driving force behind many successful probabilistic inference algorithms [47]. Suc- 
cinctly, the key advantage of RUBICON is that it exploits a form of structure that 
has thus far been under-exploited by model checkers, which is why it scales to 
more parallel factories than the existing approaches on the hard task. In Sect. 6 
we consider an extension to this motivating example that adds dependencies 
between factories. This dependency (or rather, the accompanying increase in 
the size of the underlying MC) significantly decreases scalability for the existing 
approaches but negligibly affects RUBICON. 

This leads to the task: how does one go from a PRISM model to a concise BDD 
efficiently? To do this, RUBICON leverages a novel translation from PRISM models 
into a probabilistic programming language called Dice (outlined in Sect. 5). 


Contribution and Structure. Inspired by the example, we contribute concep- 
tual and empirical arguments for leveraging BDD-based probabilistic inference 
in model checking. Concretely: 


1. We demonstrate fundamental advantages in using probabilistic inference on 
a natural class of models (Sect. 1 and 6). 

2. We explain these advantages by showing the fundamental differences between 
existing model checking approaches and probabilistic inference (Sect. 3 and 4). 
To that end, Sect. 4 presents probabilistic inference based on an operational 
and a logical perspective and combines these perspectives. 

3. We leverage those insights to build RUBICON, a tool that transpiles PRISM to 
Dice, a probabilistic programming language (Sect. 5). 
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(a) Toy-example M (b) pMC M’ (c) For M: P as ADD 


Fig. 2. (a) MC toy example (b) (distinct) pMC toy example (c) ADD transition matrix 


4. We demonstrate that RUBICON indeed attains an order-of-magnitude scaling 
improvement on several natural problems including sampling from parametric 
Markov chains and verifying network protocol stabilization (Sect. 6). 


Ultimately we argue that RUBICON makes a valuable contribution to the port- 
folio of probabilistic model checking backends, and brings to bear the extensive 
developments on probabilistic inference to well-known model checking problems. 


2 Preliminaries and Problem Statement 


We state the problem formally and recap relevant concepts. See |7] for details. We 
sometimes use p to denote 1—p. A Markov chain (MC) is a tuple M = (5,1, P, T) 
with S a (finite) set of states, ı € S the initial state, P: S — Distr(S) the 
transition function, and T a set of target states T C S, where Distr(S) is the set 
of distributions over a (finite) set S. We write P(s, s’) to denote P(s)(s’) and call 
P a transition matrix. The successors of s are Succ(s) = {s’ | P(s,s’) > 0}. To 
support MCs with billions of states, we may describe MCs symbolically, e.g., with 
PRISM [51] or as a probabilistic program [42,48]. For such a symbolic description 
P, we denote the corresponding MC with [P ]. States then reflect assignments 
to symbolic variables. 

A path T = s9...8, is a sequence of states, 7 € St. We use 7, to denote 
the last state sn, and the length of 7 above is n and is denoted |r|. Let Paths, 
denote the paths of length h. The probability of a path is the product of the 
transition probabilities, and may be defined inductively by Pr(s) = 1, Pr(z- 
s) = Pr(m)- P(m,s). For a fixed horizon h and set of states T, let the set 
[sOS’T] = {r | m= sAln| <hAm, ETAVIi < |r|. t; ¢ T} denote paths 
from s of length at most h that terminate at a state contained in T. Furthermore, 
let Prm(s H O2"T) = ere[s—o<*r] Pr(m) describe the probability to reach 
T within h steps. We simplify notation when s = ų and write [QS"T] and 
Pry (OS"T), respectively. We omit M whenever that is clear from the context. 
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Formal Problem: Given an MC M and a horizon h, compute Pr m (OST). 


Example 1. For conciseness, we introduce a toy example MC M in Fig. 2(a). 
For horizon h = 3, there are three paths that reach state (1,0): For example the 
path (0,0)(0,1)(1,0) with corresponding reachability probability 0.4 - 0.5. The 
reachability probability Pry, (OS? {(1, 0)}) = 0.42. 


It is helpful to separate the topology and the probabilities. We do this by 
means of a parametric MC (pMC) [22]. A pMC over a fixed set of parameters 
p generalises MCs by allowing for a transition function that maps to Q[pl, i.e., 
to polynomials over these variables [22]. A pMC and a valuation of parameters 
u: p — R describe a MC by replacing p with u in the transition function P 
to obtain Plu]. If P[u](s) is a distribution for every s, then we call u a well- 
defined valuation. We can then think about a pMC M as a generator of a set of 
MCs {M[u] | u well-defined}. Figure 2(b) shows a pMC; any valuation u with 
u(p), u(q) € [0, 1] is well-defined. We consider the following associated problem: 


Parameter Sampling: Given a pMC M, a finite set of well-defined valu- 
ations U, and a horizon h, compute Pr mju (O ST) for each u € U. 


We recap binary decision diagrams (BDDs) and their generalization into 
algebraic decision diagrams (ADDs, a.k.a. multi-terminal BDDs). ADDs over a 
set of variables X are directed acyclic graphs whose vertices V can be partitioned 
into terminal nodes V; without successors and inner nodes V; with two successors. 
Each terminal node is labeled with a polynomial over some parameters p (or 
just to constants in Q), val: V; > Q[p], and each inner node V; with a variable, 
var: V; + X. One node is the root node vg. Edges are described by the two 
successor functions Eo: V; —> V and Ei: V; — V. A BDD is an ADD with 
exactly two terminals labeled T and F. Formally, we denote an ADD by the tuple 
(V, vo, X, var, val, Eo, E1). ADDs describe functions f: BX — Q[p] (described by 
a path in the underlying graph and the label of the corresponding terminal node). 
As finite sets can be encoded with bit vectors, ADDs represent functions from 
(tuples of) finite sets to polynomials. 


Example 2. The transition matrix P of the MC in Fig. 2(a) maps states, encoded 
by bit vectors, (x,y), (2’,y’) to the probabilities to move from state (x,y) to 
(x',y’). Figure 2(c) shows the corresponding ADD.? 


3 A Model Checking Perspective 


We briefly analyze the de-facto standard approach to symbolic probabilistic 
model checking of finite-horizon reachability probabilities. It is an adaptation of 
qualitative model checking, in which we track the (backward) reachable states. 
This set can be thought of as a mapping from states to a Boolean indicating 
whether a target state can be reached. We generalize the mapping to a func- 
tion that maps every state s to the probability that we reach T within i steps, 


? The ADD also depends on the variable order, which we assume fixed for conciseness. 
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state horizon h 
oj 1,2] 3 i 
(0,0) 0f 0 | 0.2 | 0.42 i 
(0,1) |0|0.5[0.7510.875 S 
(1,0)}1) 1| 1 1 
(1,1) |0]0.5]0.75]0.875 1 0.2 0.75 
(a) Pra(OS"{(0,1)}) (b) Praa(OS2{(0,1)}) as ADD 


Fig. 3. Bounded reachability and symbolic model checking for the MC M in Fig. 2(a). 


denoted Pryj(s = O$'T). First, it is convenient to construct a transition relation 
in which the target states have been made absorbing, i.e., we define a matrix 
with A(s, s’) = P(s,s’) if s ¢ T and A(s,s’) = [s = s‘]° otherwise. The following 
Bellman equations characterize that aforementioned mapping: 


Pr (s H QST) = [s € T], 


| A<t = 1) , Prís! LE OSt-1 5 : 
Pr (s H 0="7) Pe Pr(s' = OSHT)  withi>0. 


The main aspect model checkers take from these equations is that to compute 
the h-step reachability from state s, one only needs to combine the h—1-step 
reachability from any state s’ and the transition probabilities P(s, s’). We define 
a vector T with T(s) = [s € T]. The algorithm then iteratively computes and 
stores the i step reachability for i = 0 to i = h, e.g. by computing A? - T 
using A-(A-(A-T)). This reasoning is thus inherently backwards and implicitly 
marginalizing out paths. In particular, rather than storing the i-step paths that 
lead to the target, one only stores a vector æ = A’-T that stores for every state 
s the sum over all i-long paths from s. 

Explicit representations of matrix A and vector x require memory at least in 
the order |S]. To overcome this limitation, symbolic probabilistic model checking 
stores both A and A’ -T as an ADD by considering the matrix as a function 
from a tuple (s,s’) to A(s,s’), and æ as a function from s to x(s) [2]. 


Example 3. Reconsider the MC in Fig. 2(a). The h-bounded reachability proba- 
bility Pryy(OS"{(1,0)}) can be computed as reflected in Fig. 3(a). The ADD for 
P is shown in Fig. 2(c). The ADD for a when h = 2 is shown in Fig. 3(b). 


The performance of symbolic probabilistic model checking is directly gov- 
erned by the sizes of these two ADDs. The size of an ADD is bounded from 
below by the number of leafs. In qualitative model checking, both ADDs are 
in fact BDDs, with two leafs. However, for the ADD representing A, this lower 
bound is given by the number of different probabilities in the transition matrix. 
In the running example, we have seen that a small program P may have an 
underlying MC [P] with an exponential state space S and equally many dif- 
ferent transition probabilities. Symbolic probabilistic model checking also scales 


3 Where [x]=1 if z holds and 0 otherwise. 
4 Excluding e.g., partial exploration or sampling which typically are not exact. 
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Pe pee NES nee {ss} > {s°,..., stuv} 
~y ost — 9884 5 d A 
s < 5r T? sstv {s} {sst, stv} 
st = —— stvu = a xX 
stu —\ stuv {st} + fati iaa seu} 
(a) CT(M, 3) (b) CT(M,3) compressed (c) Predicate as BDD 


Fig. 4. The computation tree for M and horizon 3 and its compression. We label states 
as s=(0,0), t=(0,1), u=(1,0), v=(1,1). Probabilities are omitted for conciseness. 


badly on some models where A has a concise encoding but x has too many 
different entries.” Therefore, model checkers may store x partially explicit [49]. 

The insights above are not new. Symbolic probabilistic model checking has 
advanced [46] to create small representations of both A and æ. In competitions, 
STORM often applies a bisimulation-to-explicit method that extracts an explicit 
representation of the bisimulation quotient [26,36]. Finally, game-based abstrac- 
tion [32,44] can be seen as a predicate abstraction technique on the ADD level. 
However, these methods do not change the computation of the finite horizon 
reachability probabilities and thus do not overcome the inherent weaknesses of 
the iterative approach in combination with an ADD-based representation. 


4 A Probabilistic Inference Perspective 


We present four key insights into probabilistic inference. (1) Sect. 4.1 shows 
how probabilistic inference takes the classical definition as summing over the 
set of paths, and turns this definition into an algorithm. In particular, these 
paths may be stored in a computation tree. (2) Sect.4.2 gives the traditional 
reduction from probabilistic inference to the classical weighted model counting 
(WMC) problem [16,57]. (3) Sect.4.3 connects this reduction to point (1) by 
showing that a BDD that represents this WMC is bistmilar to the computation 
tree assuming that the out-degree of every state in the MC is two. (4) Sect. 4.4 
describes and compares the computational benefits of the BDD representation. 
In particular, we clarify that enforcing an out-degree of two is a key ingredient 
to overcoming one of the weaknesses of symbolic probabilistic model checking: 
the number of different probabilities in the underlying MC. 


4.1 Operational Perspective 


The following perspective frames (an aspect of) probabilistic inference as a model 
transformation. By definition, the set of all paths — each annotated with the 
transition probabilities — suffices to extract the reachability probability. These 
sets of paths may be represented in the computation tree (which is itself an MC). 


5 For an interesting example of this, see the “Queue” example in Sect. 6. 
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Example 4. We continue from Example 1. We put all paths of length three in 
a computation tree in Fig. 4(a) (cf. the caption for state identifiers). The three 
paths that reach the target are highlighted in red. The MC is highly redundant. 
We may compress to the MC in Fig. 4(b). 


Definition 1. For MC M and horizon h, the computation tree (CT) 
CT(M, h) = (Pathsn, 1, P’,T’) is an MC with states corresponding to paths in M, 
ie., Paths, initial state ı, target states T' = [| OS"T ], and transition relation 


P'(n, 7 


j= fee ifm, ET An = T.S, (1) 


[m ET An’ =r] otherwise. 


The CT contains (up to renaming) the same paths to the target as the original 
MC. Notice that after h transitions, all paths are in a sink state, and thus we can 
drop the step bound from the property and consider either finite or indefinite 
horizons. The latter considers all paths that eventually reach the target. We 
denote the probability mass of these paths with Prm(s = OT) and refer to [7] 
for formal details. Then, we may compute bounded reachability probabilities in 
the original MC by analysing unbounded reachability in the CT: 


Prag (O@"T) = Prorem,n) (OST) = Prorm,r) (OF). 


The nodes in the CT have a natural topological ordering. The unbounded reach- 
ability probability is then computed (efficiently in CT’s size) using dynamic pro- 
gramming (i.e., topological value iteration) on the Bellman equation for s ¢ T: 


Pr (s zm OT) = D ieeucktey P(s, s') g Prm(s -— OT). 


For pMCs, the right-hand side naturally is a factorised form of the solution 
function f that maps parameter values to the induced reachability probability, i.e. 
f(u) = Prrguy(OS"T) [22,24,33]. For bounded reachability (or acyclic pMCs), 
this function amounts to a sum over all paths with every path reflected by a term 
of a polynomial, i.e., the sum is a polynomial. In sum-of-terms representation, 
the polynomial can be exponential in the number of parameters [5]. 

For computational efficiency, we need a smaller representation of the CT. As 
we only consider reachability of T, we may simplify [43] the notion of (weak) 
bisimulation [6] (in the formulation of [40]) to the following definition. 


Definition 2. For M with states S, a relation R C S x S is a (weak) bisim- 
ulation (with respect to T) if sRs’ implies Pru(s = OT) = Pru(s’ H OT). 
Two states s,s’ are (weakly) bisimilar (with respect to T) if Pru(s = OT) = 
Prm(s' H OT) 


Two MCs M, M’ are bisimilar, denoted M ~ M’ if the initial states are bisimilar 
in the disjoint union of the MCs. It holds by definition that if M ~ M’, then 
Pru (OT) = Pra (OT). The notion of bisimulation can be lifted to pMCs [33]. 


6 Alternatively, on acyclic models, a large step bound h > |S| suffices. 
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Idea 1: Given a symbolic description P of a MC [P ], efficiently construct 
a concise MC M that is bisimilar to CT([P ], A). 


Indeed, the (compressed) CT in Fig. 4(b) and Fig. 4(a) are bisimilar. We remark 
that we do not necessarily compute the bisimulation quotient of CT([P ], A). 


4.2 Logical Perspective 


The previous section defined weakly bisimilar chains and showed computational 
advantages, but did not present an algorithm. In this section we frame the finite 
horizon reachability probability as a logical query known as weighted model count- 
ing (WMC). In the next section we will show how this logical perspective yields 
an algorithm for constructing bisimilar MCs. 

Weighted model counting is well-known as an effective reduction for prob- 
abilistic inference [16,57]. Let p be a logical sentence over variables C. The 
weight function Wo: C — Rso assigns a weight to each logical variable. A 
total variable assignment n: C — {0,1} by definition has weight weight(n) = 
Tee We(n(e) + (1 — Wo(c)) - (1 — n(c)). Then the weighted model count for 
y given W is WMC(y, Wo) = ene weight(7). Formally, we desire to compute 
a reachability query using a WMC query in the following sense: 


Idea 2: Given an MC M, efficiently construct a predicate Pan and a 
weight-function Wo such that Pryy(OS"T) = WMC(YȘ4 n> Wo). 


Consider initially the simplified case when the MC M is binary: every state has 
at most two successors. In this case producing (2%, n> Wo) is straightforward: 


Example 5. Consider the MC in Fig. 2(a), and note that it is binary. We intro- 
duce logical variables called state/step coins C = {cs | s E S,i < h} for every 
state and step. Assignments to these coins denote choices of transitions at par- 
ticular times: if the chain is in state s at step i, then it takes the transition to 
the lexicographically first successor of s if Cs; is true and otherwise takes the 
transition to the lexicographically second successor. To construct the predicate 
01,3) we will need to write a logical sentence on coins whose models encode 
accepting paths (red paths) in the CT in Fig. 4(a). 

We start in state s = (0,0) (using state labels from the caption of Fig. 4). We 
order states as s = (0,0) < t = (0,1) < u = (1,0) < v= (1,1). Then, cso is true 
if the chain transitions into state s at time 0 and false if it transitions to state 
t at time 0. So, one path from s to the target node (1,0) is given by the logical 
sentence (Cs,9 A 4,1 A Ct,2). The full predicate 913 is therefore: 


0513 = (Cs,0 A 705.1 A Ct,2) V (76s,0 A Ct,1) V (F6s,0 A 78,1 A Cv,2)- 
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Each model of this sentence is a single path to the target. This predicate Luh 
can clearly be constructed by considering all possible paths through the chain, 
but later on we will show how to build it more efficiently. 

Finally, we fix Wc: The weight for each coin is directly given by the transition 
probability to the lexicographically first successor: for 0 < i < h, Wo(cs,i) = 0.6 
and Wo (cti) = We(cv,i) = 0.5. The WMC is indeed 0.42, reflecting Example 1. 


When the MC is not binary, it suffices to limit the out-degree of an MC to be 
at most two by adding auxiliary states, hence binarizing all transitions, cf. [38]. 


4.3 Connecting the Operational and the Logical Perspective 


Now that we have reduced bounded reachability to weighted model counting, 
we reach a natural question: how do we perform WMC?" Various approaches 
to performing WMC have been explored; a prominent approach is to compile 
the logical function into a binary decision diagram (BDD), which supports fast 
weighted model counting [21]. In this paper, we investigate the use of a BDD- 
driven approach for two reasons: (i) BDDs admit straightforward support for 
parametric models. (ii) BDDs provide a direct connection between the logical and 
operational perspectives. To start, observe that the graph of the BDD, together 
with the weights, can be interpreted as an MC: 


Definition 3. Let y* be a propositional formula over variables X and <x an 
ordering on X. Let BDD(y*, <x) = (V, vo, X, var, val, Eo, E1) be the correspond- 
ing BDD, and let W be a weight function on X withO < W(x) <1. We define the 
MC BDDwc(y*, <x,W) = (9,1, P,T) with S = V, ı = v, P(s) = {Eo(s) = 
W (var(s)), E1(s) = 1 — W(var(s))} and T = {v € V | val(v) = 1}. 


These BDDs are intimately related to the computation trees discussed before. For 
a binary MC M, the tree CT(M, h) is binary and can be considered as a (not 
necessarily reduced) BDD. More formally, let us construct BDDuc (YK n> <c,). 
We fix a total order on states. Then we fix state/step coins C = {cs | s E€ S,i < 
h} and the weights as in Example 5. Finally, let <ç be an order on C such that 
i < j implies c.i<cces,;. Then: 


CT(M, h) ~ BDDmc(Yn: <c; W). (2) 


In the spirit of Idea 1, we thus aim to construct BDDmc(Y$4.n; <C, W), a repre- 


sentation as outlined in Idea 2, efficiently. Indeed, the BDD (as MC) in Fig. 4(c) 
is bisimilar to the MC in Fig. 4(b). 


Idea 3: Represent a bisimilar version of the computation tree using a BDD. 


T In this paper, we concentrate on reductions to exact WMC, leaving approximate 
approaches for future work [14]. 
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(a) Unfactorized computation tree for (h=1, n=3). (b) Factorized (h=2,n=2). 


Fig. 5. Two computation trees for the motivating example in Sect. 1. 


4.4 The Algorithmic Benefits of BDD Construction 


Thus far we have described how to construct a binarized MC bisimilar to the 
CT. Here, we argue that this construction has algorithmic benefits by filling in 
two details. First, the binarized representation is an important ingredient for 
compact BDDs. Second, we show how to choose a variable ordering that ensures 
that the BDDs grow linearly in the horizon. In sum, 


Idea 4: WMC encodings of binarized Markov Chains may increase compres- 
sion of computation trees. 


To see the benefits of binarized transitions, we return to the factory exam- 
ple in Sect. 1. Figure 5(a) gives a bisimilar computation tree for the 3-factory 
h = 1 example. However, in this tree, the states are unfactorized: each node in 
the tree is a joint configuration of factories. This tree has 8 transitions (one for 
each possible joint state transition) with 8 distinct probabilities. On the other 
hand, the bisimilar computation tree in Fig. 1(d) has binarized transitions: each 
node corresponds to a single factory’s state at a particular time-step, and each 
transition describes an update to only a single factory. This binarization enables 
the exploitation of new structure: in this case, the independence of the facto- 
ries leads to smaller BDDs, that is otherwise lost when considering only joint 
configurations of factories. 

Recall that the size of the ADD representation of the transition function is 
bounded from below by the number of distinct probabilities in the underlying 
MC: in this case, this is visualized by the number of distinct outgoing edge 
probabilities from all nodes in the unfactorized computation tree. Thus, a good 
binarization can have a drastically positive effect on performance. For the run- 
ning example, rather than 2” different transition probabilities (with n factories), 
the system now has only 4 - n distinct transition probabilities! 


Causal Orderings. Next, we explore some of the engineering choices RUBICON 
makes to exploit the sequential structure in a MC when constructing the BDD for 
a WMC query. First, note that the transition matrix P(s, s’) implicitly encodes 
a distribution over state transition functions, S — S. To encode P as a BDD, 
we must encode each transition as a logical variable, similar to the situation in 
Sect. 4.2. In the case of binary transitions this is again easy. In the case of non- 
binary transitions, we again introduce additional logical variables [16,27,39,57]. 
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This logical function has the following form: 
fr: {0,1}° + (9 > S). (3) 


Whereas the computation tree follows a fixed (temporal) order of states, 
BDDs can represent the same function (and the same weighted model count) 
using an arbitrary order. Note that the BDD’s size and structure drastically 
depends both on the construction of the propositional formula and the order of 
the variables in that encoding. We can bound the size of the BDD by enforcing 
a variable order based on the temporal structure of the original MC. Specifically, 
given A coin collections C = C'x...x C, one can generate a function f describing 
the h-length paths via repeated applications of fp: 


F24O;1V" — Paths, f(C1,..., Ch) = (Fo. fe) (4) 


Let denote an indicator for the reachability property as a function over paths, 
w: Paths, > {0,1} with y(r) = [r € [OST]. We call predicates formed 
by composition with fp, i.e., p = Wo fp, causal encodings and orderings on 
Cit E C that are lexicographically sorted in time, ti < t2 => Cit, < Cit, 
causal orderings. Importantly, causally ordered / encoded BDDs grow linearly in 
horizon h, [61, Corollary 1]. More precisely, let Nth be causally encoded where 
|C| = h:m. The causally ordered BDD for Yih has at most h-|S x Sy|-m-2™ 
nodes, where |Sy| = 2 for reachability properties. However, while the worst-case 
growth is linear in the horizon, constructing that BDD may induce a super-linear 
cost in the size, e.g., function composition using BDDs is super-linear! 

Figure 5(b) shows the motivating factory example with 2 factories and h = 2. 
The variables are causally ordered: the factories in time step 1 occur before the 
factories in time step 2. For n factories, a fixed number f(n) of nodes are added to 
the BDD upon each iteration, guaranteeing growth on the order O( f(n)-h). Note 
the factorization that occurs: the BDD has node sharing (node ce) is reused) 
that yields additional computational benefits. 


Summary and Remaining Steps. The operational view highlights that we want to 
compute a transformation of the original input MC M. The logical view presents 
an approach to do so efficiently: By computing a BDD that stores a predicate 
describing all paths that reach the target, and interpreting and evaluating the 
(graph of the) BDD as an MC. In the following section, we discuss the two steps 
that we follow to create the BDD: (i) From P generate P’ such that CT([P ], R) ~ 
[P’ ]. (ii) From P’ generate M such that M = [P’]. 


5 RUBICON 


We present RUBICON which follows the two steps outlined above. For exposition, 
we first describe a translation of monolithic PRISM programs to Dice programs 


8 Generally, it is the smallest number of states required for a DFA to recognize 4). 
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module main ee 
£ + [0-1] init D; 
y + [0.2] init 1; 
0 


[] x=0 & y<2 -> 0.5:x’=1 + O.5:y’=yt1; 
[] y=2 -> 1:y’=y-1; “LA m 
[] x=1 & y!=2 -> 1:x’=y & y’=x; 


endmodule 1 Z 1,1) 2) 
property: P=? [F<=2 (x=0 & y=2)] 
(a) PRISM program with reachability query i ) EA MC 
let s = init() in // init state fun init() { (0,1) } 
let T = hit(s) in // init target fun hit((x,y)) { x ==0 && y == 2} 
let (s, T) =if !T fun step((x,y)) { 

then let s?’ =step(s) in (s’, hit(s’)) if x==0 && y<2 then 

else (s, T) in if flip 0.5 then (1,y) else (x,yt1) 
let (s, T) =if !T then else if y==2 then (x,y-1) 

then let s’ =step(s) in (s’, hit(s’)) else if x==1 &&y!=1 then (y,x) 

else (s, T) in else (x,y) 
T } 

(c) Main Dice program for h=2 (d) Dice auxiliary functions 


Fig. 6. From PRISM to Dice using RUBICON. 


and then extend this translation to admit modular programs. Technical steps 
and extensions are deferred to [38, Appendix]. 


Dice Preliminaries. We give a brief description of Dice, a probabilistic pro- 
gramming language (PPL) introduced in [39]. A PPL is a programming language 
augmented with a primitive notion of random choice: for instance, in Dice, a 
Bernoulli random variable is introduced by the syntax flip 0.5. The syntax 
of Dice is similar to the programming language OCaml: local variables are intro- 
duced by the syntax let x = e; in eg, where e; and eg are expressions, i.e., 
sub-programs. Dice supports procedures, bounded integers, bounded loops, and 
standard control flow via if-statements. 

One goal of a PPL is to perform probabilistic inference: compute the prob- 
ability that the program returns a particular value. Inference on the tiny Dice 
program let x = flip 0.1 in x would yield that true is returned with proba- 
bility 0.1. The Dice compiler performs probabilistic inference via weighted model 
counting and BDD compilation. In doing so, it accomplishes the non-trivial tasks 
of: (i) choosing a logical encoding for probabilistic programs (ii) establishing 
good variable orderings (iii) efficiently manipulating and constructing BDDs (iv) 
performing WMC . For details, we refer the reader to [39]. 

RUBICON uses Dice to effectively construct a BDD and perform WMC on a 
Dice program that reflects a description of some computation tree. This imple- 
mentation exploits the structure that was described in Sect. 4.4: in particular, the 
BDD generated in Fig. 5(b) is exactly the BDD that will be generated by Dice 
from the output of RUBICON. The variable ordering used by Dice is given by 
the order in which program variables are introduced, and RUBICON’s translation 
was designed with this variable ordering in mind. 


Transpiling PRISM to Dice. We present the core translation routine imple- 
mented in RUBICON. We note that the ultimate performance of RUBICON is 
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heavily dependent on the quality of this translation. We evaluate the perfor- 
mance in the next section. 

The PRISM specification language consists of one or more reactive modules 
(or partially synchronized state machines) that may interact with each other. Our 
example in Fig. 1(b) illustrates fully synchronized state machines. While PRISM 
programs containing multiple modules can be flattened into a single monolithic 
program, this yields an exponential blow-up: If one flattens the n modules in 
Fig. 1(b) to a single module, the resulting program has 2” updates per command. 
This motivates our direct translation of PRISM programs containing multiple 
modules. 


Monolithic Prism Programs. We explain most ideas on PRISM programs that 
consist of a single “monolithic” module before we address the modular translation 
at the end of the subsection. A module has a set of bounded variables, and the 
valuations of these variables span the state space of the underlying MC. Its 
transitions are described by guarded commands of the form: 


[act] guard — p):update,+...... + Pn : update, 


The action name act is only relevant in the modular case and can be ignored for 
now. The guard is a Boolean expression over the module’s variables. If the guard 
evaluates to true for some state (a valuation), then the module evolves into one 
of the n successor states by updating its variables. An update is chosen according 
to the probability distribution given by the expressions p1, ..., Pn. In every state 
enabling the guard, the evaluation of pı,...,pPn must sum up to one. A set of 
guards overlap if they all evaluate to true on a given state. The semantics of 
overlapping guards in the monolithic setting is to first uniformly select an active 
guard and then apply the corresponding stochastic transition. Finally, a self-loop 
is implicitly added to states without an enabled guard. 


Example 6. We present our translation primarily through example. In Fig. 6(a), 
we give a PRISM program for a MC. The program contains two variables z and 
y, where « is either zero or one, and y between zero and two. There are thus 6 
different states. We denote states as tuples with the z- and y-value. We depict 
the MC in Fig. 6(b). From state (0,0), (only) the first guard is enabled and thus 
there are two transitions, each with probability a half: one in which x becomes 
one and one in which y is increased by one. Finally, there is no guard enabled in 
state (1,1), resulting in an implicit self-loop. 


Translation. All Dice programs consist of two parts: a main routine, which is 
run by default when the program starts, and function declarations that declare 
auxiliary functions. We first define the auxiliary functions. For simplicity let us 
temporarily assume that no guards overlap and that probabilities are constants, 
i.e., not state-dependent. 

The main idea in the translation is to construct a Dice function step that, 
given the current state, outputs the next state. Because a monolithic PRISM 
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fun step((x,y)) { 
let aEn =(x>1) in 
let bEn =(y<2) in 
let act = selectFrom(aEn, bEn) in 
if act==1 then (y,x) 
else if act==2 then (min(x+1,2),y) 
else (x,y)} ... 


(a) (b) 


module main 

#2 [0:2] anit 2; 

y s [0.2] init f; 

[] x>1 -> 1:x’=yky’=x; 

[] y<2 -> 1:x’=min(x+1,2); 
endmodule 


Fig. 7. PRISM program with overlapping guards and its translation (conceptually). 


module mi fun step((x,y)) { 
ep [0:1]. init 0; let aEn =(x==1) in 
[a] x=1 -> 1:x’=1-y; let bEn =(x=0 &&y=1) in 
[b] x=0 -> 1:x’=0; let cEn =true in 
endmodule let act =selectFrom(aEn, bEn, cEn) in 
module m2 if act==1 then (1-y, y) 
y : [0:1] init 0; else if act==2 then (0, flip 0.5) 
[b] y=1 -> 0.5:y’=0 +0.5:y’=1; else if act==3 then (1-x, y) 
[c] true -> 1:x?=1-x; else (x, y) 
endmodule F 
(a) (b) 


Fig. 8. Modular PRISM and resulting Dice step function. 


program is almost a sequential program, in its most basic version, the step func- 
tion is straightforward to construct using built-in Dice language primitives: we 
simply build a large if-else block corresponding to each command. This block 
iteratively considers each command’s guard until it finds one that is satisfied. 
To perform the corresponding update we flip a coin — based on the probabilities 
corresponding to the updates — to determine which update to perform. If no 
command is enabled, we return the same state in accordance with the implicit 
self-loop. Figure6(d) shows the program blocks for the PRISM program from 
Fig. 6(a) with target state |x = 0, y = 2]. There are two other important auxil- 
iary functions. The init function simply returns the initial state by translating 
the initialization statements from PRISM, and the hit function checks whether 
the current state is a target state that is obtained from the property. 

Now we outline the main routine, given for this example in Fig. 6(c). This 
function first initializes the state. Then, it calls step 2 times, checking on each 
iteration using hit if the target state is reached. Finally, we return whether we 
have been in a target state. The probability to return true corresponds to the 
reachability probability on the underlying MC specified by the PRISM program. 


Overlapping Guards. PRISM allows multiple commands to be enabled in the 
same state, with semantics to uniformly at random choose one of the enabled 
commands to evaluate. Dice has no primitive notion of this construct.? We 
illustrate the translation in Fig. 7(a) and Fig. 7(b). It determines which guards 
aEn, bEn, cEn are enabled. Then, we randomly select one of the commands which 
are enabled, i.e., we uniformly at random select a true bit from a given tuple 


? One cannot simply condition on selecting an enabled guard as this redistributes 
probability mass over all paths and not only over paths with the same prefix. 
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of bits. We store the index of that bit and use it to execute the corresponding 
command. 


Modular Prism Programs. For modular PRISM programs, the action names at 
the front of PRISM commands are important. In each module, there is a set of 
action names available. An action is enabled if each module that contains this 
action name has (at least) one command with this action whose guard is satisfied. 
Commands with an empty action are assumed to have a globally unique action 
name, so in that case the action is enabled iff the guard is enabled. Intuitively, 
once an action is selected, we randomly select a command per module in all mod- 
ules containing this action name. Our approach resembles that for overlapping 
guards described above. See Fig. 8 for an intuitive example. To automate this, 
the updates require more care, cf. [38] for details. 


Implementation. RUBICON is implemented on top of STORM’s Python API and 
translates PRISM to Dice fully automatically. RUBICON supports all MCs in the 
PRISM benchmark suite and a large set of benchmarks from the PRISM website 
and the QVBS [35], with the note that we require a single initial state and ignore 
reward declarations. Furthermore, we currently do not support the hide/restrict 
process-algebraic compositions and some integer operations. 


6 Empirical Comparisons 


We compare and contrast the performance of STORM against RUBICON to empir- 
.10 


ically demonstrate the following strengths and weaknesses: 

Explicit Model Checking (STORM) represents the MC explicitly in a sparse 
matrix format. The approach suffers from the state space explosion, but has 
been engineered to scale to models with many states. Besides the state space, 
the sparseness of the transition matrix is essential for performance. 

Symbolic Model Checking (STORM) represents the transition matrix and 
the reachability probability as an ADD. This method is strongest when the 
transition matrix and state vector have structure that enables a small ADD 
representation, like symmetry and sparsity. 

RUBICON represents the set of paths through the MC as a (logical) BDD. This 
method excels when the state space has structure that enables a compact 
BDD representation, such as conditional independence, and hence scales well 
on examples with many (asymmetric) parallel processes or queries that admit 
a compact representation. 


The sources, benchmarks and binaries are archived.'! 
There is no clear-cut model checking technique that is superior to others (see 
QCOMP [12]). We demonstrate that, while RUBICON is not competitive on some 


10 All experiments were conducted with STORM version 1.6.0 on the same server with 
512GB of RAM, using a single thread of execution. Time was reported using the 
built-in Unix time utility; the total wall-clock time is reported. 

11 http://doi.org/10.5281/zenodo.4726264 and http://github.com/sjunges/rubicon. 


594 S. Holtzen et al. 


5 | 6 
Æ% 1,500 1,500 100 
© 1,000 1,000 |- 4 
Š b 50 + 
= 500 500 
a o o ol } o H E 
10 15 10 15 10 20 30 40 10 20 30 40 
# Factories # Factories Horizon (h) Horizon (h) 
(a) Weather Factory (b) Weather Factory 2 (c) Herman-13 (d) Herman-13 (R) 
oo 200 =" 
& 150 1,000 409 
žl n : Bank 200 |- 
5 50 
a ä i P ei al 
20 40 20 40 5 10 15 5 10 15 
Horizon (h) Horizon (h) Horizon (h) Horizon (h) 


(e) Herman-17 (£) Herman-17 (R) (g) Herman-19 (R) (h) Queues 


Fig. 9. Scaling plots comparing RUBICON (—&), STORM’s symbolic engine (——), and 
STORM’s explicit engine (—»—). An “(R)” in the caption denotes random parameters. 


commonly used benchmarks [52], it improves a modern model checking portfolio 
approach on a significant set of benchmarks. Below we provide several natural 
models on which RUBICON is superior to one or both competing methods. We 
also evaluated RUBICON on standard benchmarks, highlighting that RUBICON 
is applicable to models from the literature. We see that RUBICON is effective on 
HERMAN (elaborated below), has mixed results on BRP [38, Appendix], and is 
currently not competitive on some other standard benchmarks (NAND, EGL, 
LeaderSync). While not exhaustive, our selected benchmarks highlight specific 
strengths and weaknesses of RUBICON. Finally, a particular benefit of RUBICON 
is fast sampling of parametric chains, which we demonstrate on HERMAN and 
our factory example. 


Scaling Experiments. In this section, we describe several scaling experiments 
(Fig. 9), each designed to highlight a specific strength or weakness. 


Weather Factories. First, Fig.9(a) describes a generalization of the motivating 
example from Sect. 1. In this model, the probability that each factory is on strike 
is dependent on a common random event: whether or not it is raining. The rain 
on each day is dependent on the previous day’s weather. We plot runtime for 
an increasing number of factories for h=10. Both STORM engines eventually fail 
due to the state explosion and the number of distinct probabilities in the MC. 
RUBICON is orders of magnitude faster in comparison, highlighting that it does 
not depend on complete independence among the factories. Figure 9(b) shows 
a more challenging instance where the weather includes wind which, each day, 
affects whether or not the sun will shine, which in turn affects strike probability. 


Herman. Herman is based on a distributed protocol [37] that has been well- 
studied [1,53] and which is one of the standard benchmarks in probabilistic 
model checking. Rather than computing the expected steps to ‘stabilization’, we 
consider the step-bounded probability of stabilization. Usually, all participants in 
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the protocol flip a coin with the same bias. The model is then highly symmetric, 
and hence is amenable to symbolic representation with ADDs. Figures 9(c) and 
9(e) show how the methods scale on Herman examples with 13 and 17 parallel 
processes. We observe that the explicit approach scales very efficiently in the 
number of iterations but has a much higher up-front model-construction cost, 
and hence can be slower for fewer iterations. 

To study what happens when the coin biases vary over the protocol partici- 
pants, we made a version of the Herman protocol where each participant’s bias 
is randomly chosen, which ruins the symmetry and so causes the ADD-based 
approaches to scale significantly worse (Figs.9(d) and 9(f), and 9(g)); we see 
that symbolic ADD-based approaches completely fail on Herman 17 and Her- 
man 19 (the curve terminating denotes a memory error). RUBICON and the 
explicit approach are unaffected by varying parameters. 


Queues. The Queues model has K queues of capacity Q where every step, tasks 
arrive with a particular probability. Three queues are of type 1, the others of 
type 2. We ask the probability that all queues of type 1 and at least one queue 
of type 2 is full within k steps. Contrary to the previous models, the ADD 
representation of the transition matrix is small. Figure 9(h) shows the relative 
scaling on this model with K = 8 and Q = 3. We observe that ADDs quickly 
fail due to inability to concisely represent the probability vector x from Sect. 3. 
RUBICON outperforms explicit model checking until h = 10. 


Sampling Parametric Markov Chains. We evaluate performance for the 
pMC sampling problem outlined in Sect. 2. Table 1 gives for four models the time 
to construct the BDD and to perform WMC, as well as the time to construct 
an ADD in STORM and to perform model checking with this ADD. Finally, 
we show the time for STORM to compute the solution function of the pMC 
(with the explicit representation). The pMC sampling in STORM — symbolic and 
explicit — computes the reachability probabilities with concrete probabilities. 
RUBICON, in contrast, constructs a ‘parametric’ BDD once, amortizing the cost 
of repeated efficient evaluation. The ‘parametric BDD’ may be thought of as a 
solution function, as discussed in Sect. 4.1. STORM cannot compute these solution 
functions as efficiently. We observe in Table 1 that fast parametric sampling is 
realized in RUBICON: for instance, after a 40s up-front compilation of the factories 
example with 15 factories, we have a solution function in factorized form and it 
costs an order of magnitude less time to draw a sample. Hence, sampling and 
computation of solution functions of pMCs is a major strength of RUBICON. 


7 Discussion, Related Work, and Conclusion 


We have demonstrated that the probabilistic inference approach to probabilis- 
tic model checking can improve scalability on an important class of problems. 
Another benefit of the approach is for sampling pMCs. These are used to evaluate 
e.g., robustness of systems [1], or to synthesise POMDP controllers [41]. Many 
state-of-the-art approaches [17, 19,24] require the evaluation of various instanti- 
ated MCs, and RUBICON is well-suited to this setting. More generally, support 
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Table 1. Sampling performance comparison and pMC model checking, time in seconds. 


Model RUBICON STORM (w/ ADD) Storm (explicit) 
Build WMC | Build | Solve pMC solving 

Herman R 13 (h = 10) |3 <1 32 18 >1800 

Herman R 17 (h = 10) | 45 28 >1800 | — >1800 

Factories 12 (h = 15) |2 <1 59 286 >1800 

Factories 15 (h = 15) |40 4 >1800 | — >1800 


of inference techniques opens the door to a variety of algorithms for additional 
queries, e.g., computing conditional probabilities [3,8]. 

An important limitation of probabilistic inference is that only finitely many 
paths can be stored. For infinite horizon properties in cyclic models, an infi- 
nite set of arbitrarily long paths would be required. However, as standard in 
probabilistic model checking, we may soundly approximate infinite horizons. 
Additionally, the inference algorithm in Dice does not support a notion of non- 
determinism. It thus can only be used to evaluate MCs, not Markov decision pro- 
cesses. However, [61] illustrates that this is not a conceptual limitation. Finally, 
we remark that RUBICON achieves its performance with a straightforward trans- 
lation. We are optimistic that this is a first step towards supporting a larger 
class of models by improving the transpilation process for specific problems. 


Related Work. The tight connection with inference has been recently inves- 
tigated via the use of model checking for Bayesian networks, the prime model 
in probabilistic inference [56]. Bayesian networks can be described as probabilis- 
tic programs [10] and their operational semantics coincides with MCs [31]. Our 
work complements these insights by studying how symbolic model checking can 
be sped up by probabilistic inference. 

The path-based perspective is tightly connected to factored state spaces. Fac- 
tored state spaces are often represented as (bipartite) Dynamic Bayesian net- 
works. ADD-based model checking for DBNs has been investigated in [25], with 
mixed results. Their investigation focuses on using ADDs for factored state 
space representations. We investigate using BDDs representing paths. Other 
approaches also investigated a path-based view: The symbolic encoding in [28] 
annotates propositional sub-formulae with probabilities, an idea closer to ours. 
The underlying process implicitly constructs an (uncompressed) CT leading to 
an exponential blow-up. Likewise, an explicit construction of a computation 
tree without factorization is considered in [62]. Compression by grouping paths 
has been investigated in two approximate approaches: [55] discretises probabil- 
ities and encodes into a satisfiability problem with quantifiers and bit-vectors. 
This idea has been extended [60] to a PAC algorithm by purely propositional 
encodings and (approximate) model counting [14]. Finally, factorisation exploits 
symmetries, which can be exploited using symmetry reduction [50]. We highlight 
that the latter is not applicable to the example in Fig. 1(d). 
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There are many techniques for exact probabilistic inference in various forms 
of probabilistic modeling, including probabilistic graphical models [20,54]. The 
semantics of graphical models make it difficult to transpile PRISM programs, 
since commonly used operations are lacking. Recently, probabilistic program- 
ming languages have been developed which are more amenable to transpila- 
tion [13, 23, 29,30,59]. We target Dice due to the technical development that it 
enables in Sect. 4, which enabled us to design and explain our experiments. Clos- 
est related to Dice is ProbLog [27], which is also a PPL that performs inference 
via WMC; ProbLog has different semantics from Dice that make the transla- 
tion less straightforward. The paper [61] uses an encoding similar to Dice for 
inferring specifications based on observed traces. ADDs and variants have been 
considered for probabilistic inference [15,18,58], which is similar to the process 
commonly used for probabilistic model checking. The planning community has 
developed their own disjoint sets of methods [45]. Some ideas from learning have 
been applied in a model checking context [11]. 


8 Conclusion 


We present RUBICON, bringing probabilistic AI to the probabilistic model check- 
ing community. Our results show that RUBICON can outperform probabilistic 
model checkers on some interesting examples, and that this is not a coincidence 
but rather the result of a significantly different perspective. 
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Abstract. Partially-Observable Markov Decision Processes (POMDPs) 
are a well-known stochastic model for sequential decision making under 
limited information. We consider the EXPTIME-hard problem of syn- 
thesising policies that almost-surely reach some goal state without ever 
visiting a bad state. In particular, we are interested in computing the 
winning region, that is, the set of system configurations from which a 
policy exists that satisfies the reachability specification. A direct appli- 
cation of such a winning region is the safe exploration of POMDPs by, 
for instance, restricting the behavior of a reinforcement learning agent to 
the region. We present two algorithms: A novel SAT-based iterative app- 
roach and a decision-diagram based alternative. The empirical evaluation 
demonstrates the feasibility and efficacy of the approaches. 


1 Introduction 


Partially observable Markov decision processes (POMDPs) constitute the stan- 
dard model for agents acting under partial information in uncertain environ- 
ments [34,52]. A common problem is to find a policy for the agent that maxi- 
mizes a reward objective [36]. This problem is undecidable, yet, well-established 
approximate [27], point-based [43], or Monte-Carlo-based [49] methods exist. 
In safety-critical domains, however, one seeks a safe policy that exhibits strict 
behavioral guarantees, for instance in the form of temporal logic constraints [44]. 
The aforementioned methods are not suitable to deliver provably safe policies. 
In contrast, we employ almost-sure reach-avoid specifications, where the proba- 
bility to reach a set of avoid states is zero, and the probability to reach a set of 
goal states is one. Our Challenge 1 is to compute a policy that adheres to such 
specifications. Furthermore, we aim to ensure the safe exploration of a POMDP, 
with safe reinforcement learning [23] as direct application. Challenge 2 is then 
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to compute a large set of safe policies for the agent to choose from at any state 
of the POMDP. Such sets of policies are called permissive policies [21,31]. 


POMDP Almost-Sure Reachability Verification. Let us remark that in POMDPs, 
we cannot directly observe in which state we are, but we are in general able to 
track a belief, i.e., a distribution over states that describes where in the POMDP 
we may be. The belief allows us to formulate the following verification task: 


For a POMDP, sets of target and avoid states, and a belief, does a policy 
exist such that we reach the target states without ever visiting a bad state? 


The underlying EXPTIME-complete problem requires—in general—policies 
with access to memory of exponential size in the number of states [4,18]. For 
safe exploration and, e.g., to support nested temporal properties, the ability to 
solve this problem for each belief in the POMDP is essential. 

We base our approaches on the concept of a winning region, also referred to 
as controllable or attractor regions. Such regions are sets of winning beliefs from 
which a policy exists that guarantees to satisfy an almost-sure specification. 
The verification task relates three concrete problems which we tackle in this 
paper: (1) Decide whether a belief is winning, (2) compute the maximal winning 
region, and (3) compute a large yet not necessarily maximal winning region. We 
now outline our two approaches. First, we directly exploit model checking for 
MDPs [5] using belief abstractions. The second, much faster approach iteratively 
exploits satisfiability solving (SAT) [8]. Finally, we define a scheme to enable safe 
reinforcement learning [23] for POMDPs, referred to as shielding [2,30]. 


MDP Model Checking. A prominent approach gives the semantics of a POMDP 
via an (infinite) belief MDP whose states are the beliefs in the POMDP [36]. 
For almost-sure specifications, it is sufficient to consider belief-supports rather 
than beliefs. In particular, two beliefs with the same support are either both in a 
winning region or not [47]. We abstract a belief MDP into a finite belief-support 
MDP, whose states are the support of beliefs. The (maximal) winning region are 
(all) states of the belief-support MDP from which one can almost surely reach 
a belief support that contains a goal state without visiting belief support states 
that contain an avoid state. 

To find a winning region in the POMDP, we thus just have to solve almost- 
sure reachability in this finite MDP. The number of belief supports, however, is 
exponentially large in the number of POMDP states, threatening the efficient 
application of explicit state verification approaches. Symbolic state space rep- 
resentations are a natural option to mitigate this problem [7]. We construct a 
symbolic description of the belief support MDP and apply state-of-the-art sym- 
bolic model checking. Our experiments show that this approach (referred to as 
MDP Model Checking) does in general not alleviate the exponential blow-up. 


Incremental SAT Solving. While the belief support model exploits the structure 
of the belief support MDP by using a symbolic state space representation, it does 
not exploit elementary properties of the structure of winning regions. To overcome 
the scalability challenge, we aim to exploit information from the original POMDP, 
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rather than working purely on the belief-support MDP. In a nutshell, our app- 
roach computes the winning regions in a backward fashion by optimistically search- 
ing policies without memory on the POMDP level. Concretely, starting from the 
belief support states that shall be reached almost-surely, further states are added 
to the winning region if we quickly can find a policy that reaches these states with- 
out visiting those that are to avoid. We search for these policies by incrementaly 
employing an encoding based on SAT solving. This symbolic encoding avoids an 
expensive construction of the belief support MDP. The computed winning region 
directly translates to sufficient constraints on the set of safe policies, i.e., each pol- 
icy satisfying these constraints satisfies, by construction, the specification. The key 
idea is to successively add short-cuts corresponding to already known safe policies. 
These changes to the structure of the POMDP are performed implicitly on the SAT 
encoding. The resulting scalable method is sound, but not complete by itself. How- 
ever, it can be rendered complete by trading off a certain portion of the scalability; 
intuitively one would eventually search for policies with larger amounts of memory. 


Shielding. An agent that stays within a winning region is guaranteed to adhere 
to the specification. In particular, we shield (or mask) any action of the agent 
that may lead out of the winning region [1,39,42]. We stress that the shape of 
the winning region is independent of the transition probabilities or rewards in 
the POMDP. This independence means that the only prior knowledge we need to 
assume is the topology, that is, the graph of the POMDP. A pre-computation of 
the winning region thus yields a shield and allows us to restrict an agent to safely 
explore environments, which is the essential requirement for safe reinforcement 
learning [22,23] of POMDPs. The shield can be used with any RL agent [2]. 


Comparison with the State-of-the-Art. Similar to our approach, [15] solves almost- 
sure specifications using SAT. Intuitively, the aim is to find a so-called simple pol- 
icy that is Markovian (aka memoryless). Such a policy may not exist, yet, the 
method can be applied to a POMDP that has an extended state space to account 
for finite memory [33,37]. There are three shortcomings that our incremental SAT 
approach overcomes. First, one needs to pre-define the memory a policy has at 
its disposal, as well as a fixed lookahead on the exploration of the POMDP. Our 
encoding does not require to fix these hyperparameter a priori. Second, the app- 
roach is only feasible if small memory bounds suffice. Our approach scales to mod- 
els that require policies with larger memory bounds. Third, the approach finds a 
single simple policy starting from a pre-defined initial state. Instead, we find a 
large winning region. For safe exploration, this means that we may exclude many 
policies and never explore important parts of the system, harming the final per- 
formance of the agent. Shielding MDPs is not new [2,9,10,30]. However, those 
methods do neither take partial observability into account, nor can they guaran- 
tee reaching desirable states. Nam and Alur [39] cover partial observability and 
reachability, but do not account for stochastic uncertainty. 


Experiments. To showcase the feasibility of our method, we adopted a number of 
typical POMDP environments. We demonstrate that our method scales better 
than the state of the art. We evaluate the shield by letting an agent explore the 
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POMDP environment according to the permissive policy, thereby enforcing the 
satisfaction of the almost-sure specification. We visualize the resulting behavior 
of the agent in those environments with a set of videos. 


Contributions. Our paper makes four contributions: (1) We present an incre- 
mental SAT-based approach to compute policies that satisfy almost-sure prop- 
erties. The method scales to POMDPs whose belief-support states count billions; 
(2) The novel approach is able to find large winning regions that yield permis- 
sive policies. (3) We implement a straightforward approach that constructs the 
belief-support symbolically using state-of-the-art model checking. We show that 
its completeness comes at the cost of limited scalability. (4) We construct a 
shield for almost-sure specifications on POMDPs which enforces at runtime that 
no unsafe states are visited and that, under mild assumptions, the agent almost- 
surely reaches the set of desirable states. 


Further Related Work. Chatterjee et al. compute winning regions for minimizing 
a reward objective via an explicit state representation [17], or consider almost- 
sure reachability using an explicit state space [16,51]. The problem of determin- 
ing any winning policy can be cast as a strong cyclic planning problem, proposed 
earlier with decision diagrams [7]. Indeed, our BDD-based implementation on the 
belief-support MDP can be seen as a reimplementation of that approach. 

Quantitative variants of reach-avoid specifications have gained attention in, 
e.g., [11,28,40]. Other approaches restrict themselves to simple policies [3, 33, 45, 
58]. Wang et al. [55] use an iterative Satisfiability Modulo Theories (SMT) [6] 
approach for quantitative finite-horizon specifications, which requires computing 
beliefs. Various general POMDP approaches exist, e.g., [26,27,29,48, 49, 54, 56]. 
The underlying approaches depend on discounted reward maximization and can 
satisfy almost-sure specifications with high reliability. However, enforcing prob- 
abilities that are close to 0 or 1 requires a discount factor close to 1, drastically 
reducing the scalability of such approaches [28]. Moreover, probabilities in the 
underlying POMDP need to be precisely given, which is not always realistic [14]. 

Another line of work (for example [53]) uses an idea similar to winning regions 
with uncertain specifications, but in a fully observable setting. Finally, comple- 
mentary to shielding, there are approaches that guide reinforcement learning 
(with full observability) via temporal logic constraints [24, 25]. 


2 Preliminaries and Formal Problem 


We briefly introduce POMDPs and their semantics in terms of belief MDPs, before 
formalising and studying the problem variants outlined in the introduction. We 
present belief-support MDPs as a finite abstraction of infinite belief MDPs. 

We define the support supp(u) = {x € X | u(x) > 0} of a discrete probability 
distribution 4 and denote the set of all distributions with Distr(X). 


Definition 1 (MDP). A Markov decision process (MDP) is a tuple M = 
(S, Act, Hinit, P) with a set S of states, an initial distribution ini, E€ Distr(S), a 
finite set Act of actions, and a transition function P: S x Act > Distr(S). 
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Let post,(a@) = supp(P(s, a)) denote the states that may be the successors of the 
state s € S for action a € Act under the distribution P(s, a). If post,(a@) = {s} 
for all actions a, s is called absorbing. 


Definition 2 (POMDP). A partially observable MDP (POMDP) is a tuple 
P = (M,2,obs) with M = (S, Act, Hinit, P) the underlying MDP with finite 
S, Q a finite set of observations, and obs: S — N an observation function. 
We assume that there is a unique initial observation, i.e., that |{obs(s) | s € 


supp (Hinit) }| = 1. 


More general observation functions obs: S — Distr(Q2) are possible via a 
(polynomial) reduction [17]. A path through an MDP is a sequence 7, 7 = 
(so, Q0)(S1,01)...5n of states and actions. such that s;+1 €E post, (ai) for 
a; € Act and 0 <i < n. The observation function obs applied to a path yields 
an observation(-action) sequence obs(7) of observations and actions. 

For modeling flexibility, we allow actions to be unavailable in a state (e.g., 
opening doors is only available when at a door), and it turned out to be crucial 
to handle this explicitly in the following algorithms. Technically, the transition 
function is a partial function, and the enabled actions are a set EnAct(s) = {a € 
Act | post,(a) 4 Ø}. To ease the presentation, we assume that states s,s’ with 
the same observation share a set of enabled actions EnAct(s) = EnAct(s’). 


Definition 3 (Policy). A policy o: (Sx Act)* x S — Distr(Act) maps a path + 
to a distribution over actions. A policy is observation-based, if for each two paths 
T, T’ it holds that obs(m) = obs(x’) = o(m) = a(x’). A policy is memoryless, 
if for each x, n’ it holds that last(7) = last(z’) = o(@) = a(n’). A policy is 
deterministic, if for each n, a(r) is a Dirac distribution, i.e., if |supp(a(m))| = 1. 


Policies resolve nondeterminism and partial observability by turning a (PO) MDP 
into the induced infinite discrete-time Markov chain whose states are the finite 
paths of the (PO)MDP. Probability measures are defined on this Markov chain. 

For POMDPs, a belief describes the probability of being in certain state based 
on an observation sequence. Formally, a belief b is a distribution b € Distr(S) 
over the states. A state s with positive belief b(s) > 0 is in the belief support, 
s € supp(b). Let Prg (S’) denote the probability to reach a set S’ C S of states 
from belief b under the policy ø. More precisely, Prg (.S’) denotes the probability 
of all paths that reach S’ from b when nondeterminism is resolved by ø. 

The policy synthesis problem usually consists in finding a policy that satisfies 
a certain specification for a POMDP. We consider reach-avoid specifications, a 
subclass of indefinite horizon properties [46]. For a POMDP P with states S, 
such a specification is y = (REACH, AVOID) C S x S. We assume that states 
in AVOID and in REACH are (made) absorbing and REACHN AVOID = Í. 


Definition 4 (Winning). A policy o is winning for p from belief b in 
(PO)MDP P iff Prf(AVOID) = 0 and Prj(REACH) = 1, i.e., if it reaches 
AVOID with probability zero and REACH with probability one (almost-surely) 
when b is the initial state. Belief b is winning for p in P if there exists a winning 
policy from b. 
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We omit P and y whenever it is clear from the context and simply call b winning. 


Problem 1: Given a POMDP, a belief b, and a specification y, decide 
whether b is winning and find a policy ø that is winning from b. 


The problem is EXPTIME-complete [18]. Contrary to MDPs, it is not sufficient 
to consider memoryless policies. 

Model checking queries for POMDPs often rely on the analysis of the belief 
MDP. Indeed, we may analyse this generally infinite model. Let us first recap 
a formal definition of the belief MDP, using the presentation from [11]. In the 
following, let P(s, a, z) := X ye globs(s')=z2] - P(s, a, s’) denote the probability’ 
to move to (a state with) observation z from state s using action a. Then, 
P(b, a, z) := )).eg B(s) - P(s, a, z) is the probability to observe z after taking a 
in b. We define the belief obtained by taking a from b, conditioned on observing z: 

[obs(s’)=z] - X ses 6(s) - P(s, a, 8’) 


update(bla, z)(s’) := P(b,a,2) ; (1) 


Definition 5 (Belief MDP). The belief MDP of POMDP P = (M, R, obs) 
where M = (S, Act, Hinit, P) is the MDP BelMDP(P) := (B, Act, Pg, Hinit) with 
B = Distr(S), and transition function Pg given by 


f Pk if b' = update(b|a, obs(b’)), 
Pg(b, a, 6’) := 
0 otherwise. 

Due to (1) and the unique initial observation, we may restrict the beliefs to B = 
Ucn Distr({s | obs(s) = z}), that is, each belief state has a unique associated 
observation. We can lift specifications to belief MDPs: Avoid-beliefs are the set 
of beliefs b such that supp(b) N AVOID # Ø, and reach-beliefs are the set of 
beliefs b such that supp(b) C REACH. 

Towards obtaining a finite abstraction, the main algorithmic idea is the fol- 
lowing. For the qualitative reach-avoid specifications we consider, the belief prob- 
abilities are irrelevant—only the belief support is important [47]. 


Lemma 1. For winning belief b, belief b’ with supp(b) = supp(b’) is winning. 

Consequently, we can abstract the belief MDP into a finite belief support MDP. 
Definition 6 (Belief-Support MDP). For a POMDP P = (M, 2, obs) with 
M = (S,Act, init, P), the finite state space of a belief-support MDP Pp is 
B = {b C S | Vs,s! € b: obs(s) = obs(s’)} where each state is the support of 


a belief state. Action a in state b leads (with an irrelevant positive probability 
p> 0) to a state b’, if 


bre { J post, (a) N {s | obs(s) = z}|z€ a}. 


seb 


1 We use Iverson brackets: [x] = 1 if x holds and 0 otherwise. 
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Thus, transitions between states within b and b are mimicked in the POMDP. 
Equivalently, the following clarifies the belief-support MDP as an abstraction of 
the belief MDP: there are transitions with action a between b and 0’, if there 
exists beliefs b, 6’ with supp(b) = b and supp(b’) = b’, such that b’ € post,(a). 
We lift the specification as before: 


Definition 7 (Lifted specification). For p = (AVOID, REACH), we define 
PB = (AVOIDp, REACHg) with AVOID, = {b | b N AVOID Æ Oy, and 
REACHps = {b | b C REACH}. 


We obtain the following lemma, which follows from the fact that almost-sure 
reachability is a graph property?. 


Lemma 2. If belief b is winning in the POMDP P for oy, then the support 
supp(b) is winning in the belief-support MDP Pp for pp. 


Lemma 2 yields an equivalent reformulation of Problem 1 for belief supports: 


Problem 1 (equivalent): Given a POMDP P, belief b, and specification 
y, decide whether supp(b) is winning for yp in the belief-support MDP Pp. 


3 Winning Regions 


This section provides the observations on winning regions, a key concept for this 
paper. An important consequence of Lemma2 and the reformulation of Prob- 
lem 1 to the belief-support MDP is that the initial distribution of the POMDP 
is no longer relevant. Winning policies for individual beliefs may be composed 
to a policy that is winning for all of these beliefs, using the individual action 
choices. 


Lemma 3. If the policies o and o’ are winning for the belief supports b and b', 
respectively, then there exists a policy o” that is winning for both b and b'. 


While this statement may seem trivial on the MDP (or equivalently on beliefs), 
we notice that it does not hold for POMDP states. As a natural consequence, 
we are able to consider winning beliefs without referring to a specific policy. 


Definition 8 (Winning region). Let o be a policy. A set WZ C B of belief 
supports is a winning region for y and ø, if o is winning from each be WZ. A 
set W, C B is a winning region for p, if every bE W, is winning. The region 
containing all winning beliefs is the maximal winning region®. 


? Although the probabilities are not relevant to compute almost-sure reachabil- 
ity, it is important to notice that almost-sure reachability is different from sure- 
reachability [5]: For almost-sure reachability, there can be an infinite path that 
never reaches the target, as long as the probability mass over all those paths is 
0. Almost-sure reachability can, however, be expressed as sure-reachability in a par- 
ticular game-setting [47]. 

3 In some literature, winning region always refers to a maximal winning region. 
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Observe that the maximal winning region in MDPs exists for qualitative reach- 
ability, but not for quantitative reachability, which we do not consider here. 


Problem 2: Given a POMDP P and a specification y, find the maximal 
winning region W,. 


Using this definition of winning regions, we are able to reformulate Problem 1 
by asking whether the support of some belief b is in the winning region. 

Part of Problem 1 was to compute a winning policy. Below, we study the 
connection between the winning region and winning policies. We are interested 
in subsets of the maximal winning region that exhibit two properties: 


Definition 9 (Deadlock-free). A set W of belief-supports W C B is 
deadlock-free, if for every b E€ W, an action a € EnAct(b) exists such that 
post,(a) C W. 


Definition 10 (Productive). A set of belief supports W C B is productive 
(towards a set REACHp), if from every b E€ W, there exists a (finite) path 
T = boa1b,...bn from bo to bn E€ REACH g with bi € W and post,,(a) C W for 
alll<i<n. 


Every productive region is deadlock-free, as REACH-states are absorbing. The 
maximal winning region is productive towards REACHg (and thus deadlock- 
free) by definition. Intuitively, while a deadlock-free region ensures that one 
never has to leave the region, any productive winning region ensures that from 
every belief support within this region there is a policy to stay in the winning 
region and that can almost-surely reach a REACH-state. In particular, to find a 
winning policy (Challenge 1) or for the purpose of safe exploration (Challenge 2), 
it is sufficient to find a productive subset of the maximal winning region. We 
detail on this insight in Sect. 6. 


Problem 3: Given a POMDP P and a specification y, find a (large) pro- 
ductive winning region Wy. 


To allow a compact representation of winning regions, we exploit that for any 
belief support b’ C b it holds that post,,(@) C post(a) for all actions a € Act, 
that is, the successors of b’ are contained in the successors of b. 


Lemma 4. For winning belief support b, b! C b is winning. 


4 Iterative SAT-Based Computation of Winning Regions 


We devise an approach for iteratively computing an increasing sequence of pro- 
ductive winning regions. The approach delivers a compact symbolic encoding 
of winning regions: For a belief (or belief-support) state from a given winning 
region, we can efficiently decide whether the outcome of an action emanating 
from the state stays within the winning region. 

Key ingredient is the computation of so-called memoryless winning policies. 
We start this section by briefly recapping how to compute such policies directly 
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Fig. 1. Cheese-Maze example to explain memoryless policies and shortcuts 


on the POMDP, before we build an efficient incremental approach on top of this 
base method. In particular, we first present a naive iterative algorithm based on 
the notion of shortcuts, then describe how to implicitly add shortcuts within the 
encoding, and then finally combine the ideas to an efficient algorithm. 


4.1 One-Shot Approach to Find Small Policies from a Single Belief 


We aim to solve Problem 1 and determine a winning policy. The number of 
policies is exponential in the actions and the (exponentially many) belief support 
states. Searching among doubly exponentially many possibilities is intractable in 
general. However, Chatterjee et al. [15] observe that often much simpler winning 
policies exist and provides a one-shot approach to find them. The essential idea 
is to search only for memoryless observation-based policies a: 2 — Distr(Act) 
that are winning for the (initial) belief support b. 


Example 1. Consider the small Cheese-POMDP [35] in Fig. 1(a). States are cells, 
actions are moving in the cardinal directions (if possible), and observations are 
the directions with adjacent cells, e.g., the boldface states 6,7,8 share an obser- 
vation. We set REACH = {10} and AVOID = {9,11}. From belief support 
b = {6,8} there is no memoryless winning policy—In states {6,8} we have to 
go north, which prevents us from going south in state 7. However, we can find a 
memoryless winning policy for {1,5}, see Fig. 1(b). 


This problem is NP-complete, and it is thus natural to encode the problem as a 
satisfiability query in propositional logic. We mildly adapt the original encoding 
of winning policies [15]. We introduce three sets of Boolean variables: Aza, Cs 
and P, j. If a policy takes action a € Act with positive probability upon obser- 
vation z € 92, then and only then, Az a is true. If under this policy a state s € S 
is reached from some initial belief support b, with positive probability, then and 
only then, Cs is true. We define a maximal rank k to ensure the productivity. 
For each state s and rank 0 < j < k, variable P, j indicates rank j for s, that 
is, a path from s leads to s’ € REACH within j steps.4 A winning policy is 
then obtained by finding a satisfiable solution (via a SAT solver) to the conjunc- 
tion W$ (b,, k) of the constraints (2a)-(5), where Sẹ = S \ (AVOID U REACH). 


4 Notice that a state s can have multiple ‘ranks’ in this encoding. Its rank is the 
smallest j such that Ps, j is true. 


Enforcing Almost-Sure Reachability in POMDPs 611 


AG (2a) ACN Asa) @b) 


seb, z€Q  acEnAct(z) 


The initial belief support is clearly reachable (2a). The conjunction in (2b) 
ensures that in every observation, at least one action is taken. 


A -c A N (Cna A Cv) @ 


sc AVOID ses s' Epost, (a) 
a€EnAct(s) 
The conjunction (3) ensures that for any model for these formulas, the set of 
states {s € S | Cs = true} is reachable, does not overlap with AVOID, and is 
transitively closed under reachability (for the policy described by Az a). 


N Cs > Pox (4) 
SES? 
VAN TE s,0 A^ \\ Pj oa ( Vv (A obs(s),a (V Pz j- 1 ))) (5) 
s¢REACH sES? a€EnAct(s) s' Epost, (a) 
1<j<k 


Conjunction (4) states that any state that is reached almost-surely reaches a 
state in REACH, i.e., that there is a path (of length at most) k to the target. 
Conjunctions (5) describe a ranking function that ensures the existence of this 
path. Only states in REACH have rank zero, and a state with positive probability 
to reach a state with rank j—1 within a step has rank at most j. 

By [15, Thm. 2], it holds that the conjunction $(b,,k) of the con- 
straints (2a)-(5) is satisfiable, if there is a memoryless observation-based pol- 
icy such that ọ is satisfied. If k = |S], then the reverse direction also holds. If 
k < |S|, we may miss states with a higher rank. Large values for k are practically 
intractable [15], as the encoding grows significantly with k. Pandey and Rinta- 
nen [41] propose extending SAT-solvers with a dedicated handling of ranking 
constraints. 

In order to apply this to small-memory policies, one can unfold log(m) bits of 
memory of such a policy into an m times larger POMDP [15,33], and then search 
for a memoryless policy in this larger POMDP. Chatterjee et al. [15] include a 
slight variation to this unfolding, allowing smaller-than-memoryless policies by 
enforcing the same action over various observations. 


4.2 Iterative Shortcuts 


We exploit the one-shot approach to create a naive iterative algorithm that con- 
structs a productive winning region. The iterative algorithm avoids the following 
restrictions of the one-shot approach. (1) In order to increase the likelihood of 
finding winning policies, we do not restrict ourselves to small-memory policies, 
and (2) we do not have to fix a maximal rank k. These modifications allow us 
to find more winning policies, without guessing hyper-parameters. As we do not 
need to fix the belief-state, those parts of the winning region that are easy to 
find for the solver are encountered first. 
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The One-Shot Approach on Winning Regions. To understand the naive iterative 
algorithm, it is helpful to consider the previous encoding in the light of Problem 
3, i.e., finding productive winning regions. Consider first the interpretation of 
the variables. Indeed, observe that we have found the same winning policy for 
all states s where Cs is true. Consequentially, any belief support b, = {s | 
C, true A obs(s) = z} is winning. 


Lemma 5. Ifo is winning for b and b', then o is also winning for bUb. 


This lemma is somewhat dual to Lemma 4, but requires a fixed policy. The 
constraints (3) and ensure that a winning-region is deadlock-free. The constraints 
(4) and (5) ensure productivity of the winning region. 


Adding Shortcuts Explicitly. The key idea is that we iteratively add short-cuts 
in the POMDP that represent known winning policies. We find a winning policy 
o for some belief states in the first iteration, and then add a fresh action a, 
to all (original) POMDP states: This action leads — with probability one — to 
a REACH state, if the state is in the wining belief-support under policy o. 
Otherwise, the action leads to an AVOID state. 


Definition 11. For POMDP P = (M,2,obs) where M = (S, Act, Minit, P) 
and a policy o with associated winning region WE, and assuming w.l.o.g., T € 
REACH and L € AVOID, we define the shortcut POMDP P{o} = (M’, 92, obs) 
with M! = (S, Act’, pinit, P), Act’ = ActU{a,}, P’(s,a) = P(s,a) for alls € S 
and a € Act, and P’(s,az¢) = {T = Hs} € WZ], L [{s} g W]}. 


Lemma 6. For a POMDP P and policy o, the (maximal) winning regions for 
Pi{o} and P coincide. 


First, adding more actions will not change a winning belief-support to be not 
winning. Furthermore, by construction, taking the novel action will only lead to 
a winning belief-support whenever following o from that point onwards would 
be a winning policy. The key benefit is that adding shortcuts may extend the 
set of belief-support states that win via a memoryless policy. This observation 
also gives rise to the following extension to the one-shot approach. 


Example 2. We continue with Example 1. If we add shortcuts, we can now find 
a memoryless winning policy for b = {6,8}, depicted in Fig. l(c). 


Iterative Shortcuts to Extend a Winning Region. The idea is now to run the one- 
shot approach, extract the winning region, add the shortcuts to the POMDP, and 
rerun the one-shot approach. To make the one-shot approach applicable in this 
setting, it only needs one change: Rather than fixing an initial belief-support, 
we ask for an arbitrary new belief-support to be added to the states that we 
have previously covered. We use a data structure Win such that Win(z) encodes 
all winning belief supports with observation z. Internally, the data structure 
stores maximal winning belief supports (w.r.t. set inclusion, see also Lemma 4) 
as bit-vectors. By construction, for every b € Win(z), a winning region exists, 
i.e., conceptually, there is a shortcut-action leading to REACH. 
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Algorithm 1 Naive construction of winning regions 
Input: POMDP P, reach-avoid specification p 
Output: Winning region encoded in Win 
Win(z) — {s € REACH | obs(s) = z} for all z € 2 


® — Encode(P, yp, Win) > Create encoding (2b),(3),(6),(7). 
while 47 s.t. n = & do > Call an SMT solver 
Win(z) — Win(z) U {b | s € b iff n(Cs)} for all z € 2 
P — Pilon} > Extend POMDP with Def. 11 


> with o, policy encoded by n. 
@ — Encode(P, p, Win) 


We extend the encoding (in partial preparation of the next subsection) and 
add a variable U, € b that is true if the policy is winning in a belief support 
that is not yet in Win(z). We replace (2a) with: 


Vi i A (ue Vala A ue A V @) 


zEQ zEQ ses zEN XEWin(z) sES\X 
Win(z)=0 obs(s)=z Win(z)40 obs(s)=z 


(6) 


For an observation z for which we have not found a winning belief support 
yet, finding a policy from any state s with obs(s) updates the winning region. 
Otherwise, it means finding a winning policy for a belief support that is not 
subsumed by a previous one (6). 


Real-Valued Ranking. To avoid setting a maximal path length, we use unbounded 
(real) variables Rs rather than Boolean variables for the ranking [57]. This relax- 
ation avoids the growth of the encoding and admits arbitrarily large ranks with 
a fixed-size encoding into difference logic. This logic is an extension to proposi- 
tional logic that can be checked using an SMT solver [6]. 


AC: V Aaronit y Bs > Be))) (7) 


sES? a€EnAct(s) s’Epost , (a) 


We replace (4) and (5): A state must have a successor state with a lower rank — 
as before, but with real-valued ranks (7). 


Algorithm. Together, the algorithm is given in Algorithm 1. We initialize the 
winning region based on the specification, then encode the POMDP using the 
(modified) one-shot encoding. As long as the SMT solver finds policies that are 
winning for a new belief-support, we add those belief supports to the winning 
region. In each iteration, Win contains a winning region. Once we find no more 
policies that extend the winning region on the extended POMDP, we terminate. 

The algorithm always terminates because the set of winning regions is finite, 
but in general does not solve Problem 2. Formally, the maximal winning region 
is a greatest fixpoint [5] and we iterate from below, i.e., the fixpoint that we find 
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will be the smallest fixpoint (of the operation that we implement). However, iter- 
ating from above requires to reason that none of the doubly-exponentially many 
policies is winning for a particular belief support state; whereas our approach 
profits from finding simple strategies early on. Unfolding of memory as discussed 
earlier also makes this algorithm complete, yet, suffers from the same blow-up. 
A main advantage is that the algorithm often avoids the need for unfolding when 
searching for a winning policy or large winning regions. 

Next, we address two weaknesses: First, the algorithm currently creates a new 
encoding in every iteration, yielding significant overhead. Second, the algorithm 
in many settings requires adding a bit of memory to realize behavior where in 
a particular observation, we first want to execute an action a and then follow 
a shortcut from the state (with the same observation) reached from there. We 
adapt the encoding to explicitly allow for these (non-memoryless) policies. 


4.3 Incremental Encoding of Winning Regions 


In this section, instead of naively adjusting the POMDP, we realize the idea of 
adding shortcuts directly on the encoding. This encoding is the essential step 
towards an efficacious approach for solving Problem 3. We find winning states 
based on a previous solution, and instead of adding actions, we allow the solver 
to decide following individual policies from each observation. In Sect.4.4, we 
embed this encoding into an improved algorithm. 

Our encoding represents an observation-based policy that can decide to take 
a shortcut, which means that it follows a previously computed winning policy 
from there (implicitly using Lemma 3). In addition to Az, Cs and Rs from the 
previous encoding, we use the following variables: The policy takes shortcuts in 
states s where D, is true. For each observation, we must take the same shortcut, 
referred to by a positive integer-valued index J,. More precisely, I, refers to a 
shortcut from a previously computed (fragment of a) winning region stored in 
Win(z)7,. The policy may decide to switch, that is, to follow a shortcut after 
taking an action starting in a state with observation z. If F, is true, the policy 
takes some action from z-states and from the next state, we take a shortcut. The 
encoding thus implicitly represents policies that are not memoryless but rather 
allow for a particular type of memory. 
The conjunction of (6) and (8)-(13) yields the encoding &% (Win): 


AC V Asa) A A an, (8) 


z€Q ` acEnAct(z) s€ AVOID 
A (CsA AoveieaA-Forse) > A Cy) (9) 
es eve P Epon ta 
\ (c, A Aobs(s),a A Fobs(s) > \ Dy) (10) 
ener s'Epost, (a) 


Similar to (2b), (3), we select at least one action and AVOID-states should not 
be reached (8). States reached are closed under the transitive closure, however, 
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Algorithm 2 Naive construction of winning regions with incremental encoding 
Input: POMDP P, reach-avoid specification p 
Output: Winning region encoded in Win 
Win(z) — {s € REACH | obs(s) = z} for all z € 2 
® — Encode(P, yp, Win) > Create encoding (6),(8)—(13). 
while 47 s.t. n = & do > Call an SMT solver 
Win(z) — Win(z) U {b | s € b iff n(Cs)} for all z € R 
® — Encode(P, p, Win) 


only if we do not switch to taking a shortcut (9). Furthermore, we mark the 
states reached after switching (10) and need to select a shortcut for these states. 


N (Ds > Ims) >9) A A Te < |Win(z)| (11) 
ses zED 


\ VAN D, = L +i (12) 
zERQ sE S\Win(z); 
0<i<|Win(2)| obs(s)=z 

If we reach a state s after switching, then we must pick a shortcut. We can only 
pick an index that reflects a found winning region (11). If we pick this shortcut 
reflecting a winning region (fragment) for observation z, then we are winning 
from the states in Win(z);, but not from any other state s with that observation. 
Thus, for s ¢ Win(z);, if we are going to follow any shortcut (that is, Ds holds), 
we should not pick this particular shortcut encoded by I, (because it will lead 
to an AVOID-state). In terms of the policy: Taking this previously computed 
policy from state s is not (known to) lead us to a REACH-state (12). Finally, 
we update the ranking to account for shortcuts. 


A C. = ( V (A obs(s),a (V Rs > Ry )) V Fovets)) (13) 


sES? a€EnAct(s) s' Epost, (a) 


We make a slight adaption to (7): Either we have a successor state with a lower 
rank (as before) or we follow a shortcut—which either leads to the target or to 
violating the specification (13). We formalize the correctness of the encoding: 


Lemma 7. If 7 H| (Win), then for every observation z, the belief support 
bz = {s | (C,) = true, obs(s) = z} ts winning. 


Algorithm 2 is a straightforward adaption of Algorithm 1 that avoids adding 
shortcuts explicitly (and uses the updated encoding). As before, the algorithm 
terminates and solves Problem 3. We conclude: 


Theorem 1. In any iteration, Algorithm 2 computes a productive winning region. 


4.4 An Incremental Algorithm 


We adapt the algorithm sketched above to exploit the incrementality of modern 
SMT solvers. Furthermore, we aim to reduce the invocations of the solver by 
finding some extensions to the winning region via a graph-based algorithm. 
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Algorithm 3 Incremental construction of winning regions 
Input: POMDP P, reach-avoid specification p 
Output: Winning region encoded in Win 
Win(z) — {s € REACH | obs(s) = z} for all z € 2 
Win — GraphPreprocessing(Win) 


Px — Encodegx(P, p, Win) > Create encoding (8)—(13) 
Pine — Encodenc(P, p, Win) > Encode (6) 
while 47 s.t. 7 H Dax A Pine do > Call an SMT solver, fix 7 
do > Extend policy 

Dy — A\{Az,a | (Uz) A n(Az,a)} > Part. fix policy 
while 47 s.t. n H Brax A Pyar A Py > Call SMT, fix 7 


Win(z) — Win(z) U{B | s € B iff n(Cs)} for all z € Q 

Win — GraphPreprocessing(Win) 

Pix — Bix A Encodeqijaz (P, p, Win) > Update: (11),(12) 
Pino — Encodeinc(P, p, Win) > Encode (6) 


Graph-Based Preprocessing. To reduce the number of SMT invocations, we 
employ polynomial-time graph-based heuristics. The first step is to use (fully 
observable) MDP model checking on the POMDP as follows: find all states that 
under each (not necessarily observation-based) policy reach an AVOID-state 
with positive probability, and make them absorbing. Then, we find all states 
that under each policy reach a REACH-state almost-surely. Then, we iteratively 
search for winning observations and use them to extend the REACH-states. An 
observation z is winning, if the belief-support {s | obs(s) = z} is winning. We 
start with a previously determined winning region W. We iteratively update W 
by adding states b, = {s | obs(s) = z} for some observation z, if there is an 
action a such that from every s € b+, it holds post,(a) C W. The iterative 
updates are interleaved with MDP model checking on the POMDP as described 
above until we find a fixpoint. 


Optimized Algorithm. We improve Algorithm 2 along four dimensions to obtain 
Algorithm 3. First, we employ fewer updates of the winning region: We aim to 
extend the policy as much as possible, i.e., we want the SMT-solver to find more 
states with the same observation that are winning under the same policy. There- 
fore, we fix the variables for action choices that yield a new winning policy, and 
let the SMT solver search whether we can extend the corresponding winning 
region by finding more states and actions that are compatible with the partial 
policy. Second, we observe that between (outer) iterations, large parts of the 
encoding stay intact, and use an incremental approach in which we first push 
all the constraints from the POMDP onto the stack, then all the constraints 
from the winning region, and finally a constraint that asks for progress. After 
we found a new policy, we pop the last constraint from the stack, add new con- 
straints regarding the winning region (notice that the old constraints remain 
intact), and push new constraints that ask for extending the winning region 
to the stack. We refresh the encoding periodically to avoid unnecessary clutter- 
ing. Third, further constraints (1) make the usage of shortcuts more flexible—we 
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allow taking shortcuts either immediately or after the next action, and (2) enable 
an even more incremental encoding with some minor technical reformulations. 
Fourth, we add the graph-preprocessing discussed above during the outer itera- 
tion. 


5 Symbolic Model Checking for the Belief-Support MDP 


In this section, we briefly describe how we encode a given POMDP into a belief- 
support MDP to employ symbolic, off-the-shelf probabilistic model checking. In 
particular, we employ symbolic (decision-diagram, DD) representations of the 
belief-support MDP as we expect this MDP to be huge. Constructing that DD 
representation effectively is not entirely trivial. Instead, we advocate construct- 
ing a (modular) symbolic description of the belief support MDP. Concretely, 
we automatically generate a model description in the MDP modeling language 
JANI [13],° and then apply off-the-shelf model checking on the JANI description. 

Conceptually, we create a belief-support MDP with auxiliary states to allow 
for a concise encoding.° We use this auxiliary state b to describe for any transition 
the conditioning on the observation. Concretely, a single transition P (b, a, b’) in 
the belief-support MDP is reflected by two transitions P(b, a, b) and PÊ, a,b’) 
in our encoding, where a, is a unique dummy action. We encode states using 
triples (belsup, newobs, lact}. belsup is a bit vector with entries for every state 
s that we use to encode the belief support. Variables newobs and lact store 
an observation and an action and are relevant only for the auxiliary states. 
Technically, we now encode the first transition from b with the nondeterministic 
action a to b. P(b,a) then yields (with arbitrary positive) probability a new 
observation that will reflect the observation obs(b’). We store œ and obs(b’) in 
lact and newobs, respectively. The second step is a single deterministic (dummy) 
action updating belsup while taking into account newobs. The step also resets 
lact and newobs. 

The encoding of the transitions as follows: For the first step, we create nonde- 
terministic choices for each action a and observation z. We guard these choices 
with z meaning that the edge is only applicable to states having observation z, 
i.e., the guard is Vses obs(s)=z beLsup(s). With these guarded edges, we define 
the destinations: With an arbitrary’ probability p, we go to an observation z, if 
there is at least one state in s E€ belsup which has a successor state s’ € post, (a) 
with obs(s’) = 21. 


5 The description here works on a network of synchronized state machines as is also 
common in the PRISM language. 

6 The usage of message passing or indexed assignments in JANI would circumvent the 
need for intermediate states, but is to the best of our knowledge not supported by 
decision-diagram based model checkers. 

T We leave this a parametric probability in model building to reduce the number of 
different probabilities, as this is beneficial for the size of the decision diagram that 
STORM constructs — it will only have leafs 0, p, 1. Technically, such MDPs are not 
necessarily well-defined but we can employ model checking on the graph structure. 
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The following pseudocode reflects the first step in the transition encoding. The 
syntax is as follows: take an action if a Boolean guard is satisfied, then updates 
are executed with probability prob. An example for a guard is an observation z. 


newobs — z 
prob (V_ ses_belsup(s) ? p : 0): : 
P(s,a,21)>0 lact — qQ 
take aif z then 
newobs + 2n 


prob (V_ ses __ belsup(s) ? p: 0): 
P(s,a,2n)>0 lact — Q 


The second step synchronously updates each state s’ in the POMDP indepen- 
dently: The entry belsup(s’) is set to true if obs(s) = newobs and if there is a 
state s currently true in (the old) belsup with s’ € post,(lact). The step thus 
can be captured by the following pseudocode for each s’: 


take a, iftrue then prob! : belsup(s’) — (V P(s, lact, s’) > 0) A obs(s’) 


Finally, whenever the dummy action a, is executed, we also reset the variables 
newobs and lact. The resulting encoding thus has transitions in the order of 
|S] + |2|? - | maxzeg EnAct(z)]. 


6 Almost-Sure Reachability Shields in POMDPs 


In this section, we define a shield for POMDPs — towards the application of safe 
exploration (Challenge 2) — that blocks actions which would lead an agent out 
of a winning region. In particular, the shield imposes restrictions on policies to 
satisfy the reach-avoid specification. Technically, we adapt so-called permissive 
policies [21,31] for a beliefsupport MDP. To force an agent to stay within a 
productive winning region W, for specification y, we define a y-shield v: b = 
24ct such that for any winning b for y we have v(b) C {a € Act | post,(a) C 
Wọ}, i.e., an action is part of the shield v(b) if it exclusively leads to belief 
support states within the winning region. 

A shield v restricts the set of actions an arbitrary policy may take®. We 
call such restricted policies admissible. Specifically, let b+ be the belief sup- 
port after observing an observation sequence 7. Then policy ø is v-admissible if 
supp(a(T)) C v(b,) for every observation-sequence T. Consequently, a policy is 
not admissible if for some observation sequence 7, the policy selects an action 
a € Act which is not allowed by the shield. 

Some admissible policies may choose to stay in the winning region without 
progressing towards the REACH states. Such a policy adheres to the avoid-part 
of the specification, but violates the reachability part. To enforce progress, we 


8 While memory policies based on the belief (support) are sufficient to ensure almost- 
sure reachability, the goal is to shield other policies that do not necessarily fall in 
this restricted class. 
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Fig. 2. Video stills from simulating a shielded agent on three different benchmarks. 


adapt a notion of fairness. A policy is fair if it takes every action infinitely often 
at any belief support state that appears infinitely often along a trace [5]. For 
example, a policy that randomizes (arbitrarily) over all actions is fair-we notice 
that most reinforcement learning policies are therefore fair. 


Theorem 2. For a y-shield v and a winning belief support b, any fair v- 
admissible policy satisfies p from b. 


We give a proof (sketch) in [32, Appendix]. The main idea is to show that 
the induced Markov chain of any admissible policy has only bottom SCCs that 
contain REACEH-states. 


Remark 1. If ọ is a safety specification (where Prý (AVOID) = 0 suffices), we 
can rely on deadlock-free winning regions rather than productive winning regions 
and drop the fairness assumption. 


7 Empirical Evaluation 


We investigate the applicability of our incremental approach (Algorithm 3) to 
Challenge 1 and Challenge 2, and compare with our adaption and implementa- 
tion of the one-shot approach [15], see Sect. 4.1. We also employ the MDP model- 
checking approach from Sect. 5. Experiments, videos, source code are archived’. 


Setting. We implemented the one-shot algorithm, our incremental algorithm, 
and the generation of the JANI description of the belief support MDP into the 
model checker STORM [19] on top of the SMT solver z3 [38]. To compare with 
the one-shot algorithm for Problem 1, that is, for finding a policy from the 
initial state, we add a variant of Algorithm 3. Intuitively, any outer iteration 
starts with an SMT-check to see whether we find a policy covering the initial 
states. We realize the latter by fixing (temporarily) the C,-variables. In the first 
iteration, this configuration and its resulting policy closely resemble the one- 
shot approach. For the MDP model-checking approach, we use STORM (from 
the C++ API) with the dd engine and default settings. 

For the experiments, we use a MacBook Pro MV962LL/A, a single core, no 
randomization, and use a 6 GB memory limit. The time-out (TO) is 15 min. 


° http: //doi.org/10.5281/zenodo.4784940 or on http://github.com/sjunges/shielding- 
POMDPs. 
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Baseline. We compare with the one-shot algorithm including the graph-based 
preprocessing to identify more winning observations. We use two setups: (1) We 
(manually, a-priori) search for optimal hyper-parameters for each instance. We 
search for the smallest amount of memory possible, and for the smallest maximal 
rank k (being a multiplicative of five) that yields a result. Guessing parameters 
as an “oracle” is time-consuming and unrealistic. We investigate (2) the perfor- 
mance of the one-shot algorithm by fixing the hyper-parameters to two memory- 
states and k = 30. These parameters provide results for most benchmarks. 


Benchmarks. Our benchmarks involve agents operating in Nx WN grids, inspired 
by, e.g., [12,15,20,50,51]. See Fig.2 for video stills of simulating the following 
benchmarks. Rocks is a variant of rock sample. The grid contains two rocks which 
are either valuable or dangerous to collect. To find out with certainty, the rock 
has to be sampled from an adjacent field. The goal is to collect a valuable rock, 
bring it to the drop-off zone, and not collect dangerous rocks. Refuel concerns a 
rover that shall travel from one corner to the other, while avoiding an obstacle 
on the diagonal. Every movement costs energy and the rover may recharge at 
recharging stations to its full battery capacity Æ. It receives noisy information 
about its position and battery level. Evade is a scenario where a robot needs to 
reach a destination and evade a faster agent. The robot has a limited range of 
vision (R), but may scan the whole grid instead of moving. A certain safe area 
is only accessible by the robot. Intercept is inverse to Evade in the sense that 
the robot aims to meet an agent before it leaves the grid via one of two available 
exits. On top of the view radius, the agent observes a corridor in the center of the 
grid. Avoid is a related scenario where a robot shall keep distance to patrolling 
agents that move with uncertain speed, yielding partial information about their 
position The robot may exploit their predefined routes. Obstacle contains static 
obstacles where the robot needs to reach the exit. Its initial state and movement 
are uncertain, and it only observes whether the current position is a trap or exit. 


Results for Challenge 1. Table1 details the numerical benchmark results. For 
each benchmark instance (columns), we report the name and relevant charac- 
teristics: the number of states (|S|), the number of transitions (#Tr, the edges 
in the graph described by the POMDP), the number of observations (|2|), and 
the number of belief support states (|b|). For the incremental method, we pro- 
vide the run time (Time, in seconds), the number of outer iterations (#Iter.) 
in Algorithm 3, and the number of invocations of the SMT solver (#solve), and 
the approximate size of the winning region (|W]). We then report these numbers 
when searching for a policy that wins from the initial state. For the one-shot 
method, we provide the time for the optimal parameters (on the next line)-TOs 
reflect settings in which we did not find any suitable parameters, and the time 
for the preset parameters (2,30), or N/A if no policy can be found with these 
parameters. Finally, for (belief-support) MDP model checking, we give only the 
run times. 

The incremental algorithm finds winning policies for the initial state without 
guessing parameters and is often faster versus the one-shot approach with an 
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Table 1. Numerical results towards solving Problem 1 and Problem 3. 


Inst. | Rocks (N) Refuel (N,E) | Evade (N,R) | Avoid (N,R) | Intercept (N,R) | Obstacle (N) 
4 6 6,8 7,7 62 |7,2 6,3 7,4 71 7,2 6 8 
ISI 331 |816 |270 |302 |4232 |8108 |5976 |13021 |4705 |4705 |37 |65 
#Tr |3484 |7292 |1301 |1545 |28866|57570 |14373 |33949 |18049 |18049 |224 |421 
12] 65 |74 36 35 2202 |4172 |3300 |8584 |2002 |2598 |4 4 
Ibl 3.5E5 | 7.7E25 | 5.6E14 | 7.4E19 1.168 | 4.4E11 | 1.1E15 | 2.9E17 | 6.4E10 | 2.7E9 | 1.1E9 | 2.9217 
_ |Time |19 |753 J6 3 142 |613 167 |745 116 |86 2 30 
2 -5 | #iter. |36 (284 140 30 4 6 3 4 8 8 68 |150 
$ S #solve | 1702 | 13650 |1023 |528 |681 129 | 629 027 |1171 | 976 839 |4291 
É wW] 3.5E5 | 7.7625 | 1.2E11 | 2.1E8 | 1.088 | 4.2811 | 1.1E15 | 2.9E17 | 9.2£4 |2.9E4 |4.1E7 | 3.8E14 
g Time |17 |226 l2 2 49 576 10 40 ii 2 ey <i 
ES |#lter. 29 |65 2 4 1 1 2 1 10 [12 
‘2 |#solve |1215 |2652 |62 80 1 1 81 1 114 |229 
|W] 4.464 | 1.8E13 | 8.4£6 | 3.7E4 |5.0E7 |1.0E11 | 3.7£5 |6.9E10 | 6.2E3 |2.1E3 |4.1E5 | 4.5E9 
34 Time |120 |TO |2 <1 12 270 |22 53 8 1 1 195 
3 ° | Mem,k 2,10 |? 2,15 (2,15 (1,20 |1,30 | 1,30 25 |2,10 |1,10 6,10 | 5,50 
“& |Time [TO |TO lu 37 To |TO Ito (TO. |28 18 N/A |N/A 
MDP Time |400 |TO |219 |MO ITO |TO |TO |TO /|TO |TO 6 MO 


oracle providing optimal parameters, and significantly faster than the one-shot 
approach with reasonably fixed parameters. In detail, Rocks shows that we can 
handle large numbers of iterations, solver invocations, and winning regions. The 
incremental approach scales to larger models, see e.g., Avoid. Refuel shows a 
large sensitivity of the one-shot method on the lookahead (going from 15 to 30 
increases the runtime), while Evade shows sensitivity to memory (from 1 to 2). 
In contrast, the incremental approach does not rely on user-input, yet deliv- 
ers comparable performance on Refuel or Avoid. It suffers slightly on Evade, 
where the one-shot approach has reduced overhead. We furthermore conclude 
that off-the-shelf MDP model checking is not a fast alternative. Its advantage 
is the guarantee to find the maximal winning region, however, for our bench- 
marks, maximal winning regions (empirically) coincide with the results from the 
incremental fixpoint approach. 


Results for Challenge 2. Winning regions obtained from running incrementally 
to a fixpoint are significantly larger than when running them only until an initial 
winning policy is found (cf. the table), but requires extra computational effort. 

If a shielded agent moves randomly through the grid-worlds, the larger win- 
ning regions indeed induce more permissiveness, that is, freedom to move for the 
agent (cf. the videos, Fig. 2). This observation can also be quantified. In Table 2, 
we compare the two different types of shields. For both, we give average and stan- 
dard deviation over permissiveness over 250 paths. We choose to approximate per- 
missiveness along a path as the number of cumulative actions allowed by the per- 
missive scheduler along a path, divided by the number of cumulative actions avail- 
able in the POMDP along that path. As the shield is correct by construction, each 
run indeed never visits avoid states and eventually reaches the target (albeit after 
many steps). This statement is not true for the unshielded agents. 
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Table 2. Quantification of permissiveness using fraction of allowed actions. 


Inst. Rocks (N) | Refuel (N,E) | Evade (N,R) | Avoid (N,R) | Intercept (N,R) | Obstacle (N) 
4 6 6,8 | 7,7 6,2 | 7,2 6,3 7,4 | 7,1 72 6 8 


0.45 | 0.47 0.68 | 0.74 
0.037 | 0.047 0.040 | 0.047 
0.78 | 0.84 0.73 | 0.73 
0.078 | 0.070 0.036 | 0.059 


avg |0.85 |0.81 | 0.43 0.36 | 0.62 
stdev | 0.066 | 0.070 | 0.046 | 0.014 | 0.046 
avg |0.88 |0.89 0.77 0.73 | 0.86 
stdev | 0.060 | 0.037 | 0.037 | 0.024 | 0.015 


0.50 | 0.51 | 0.56 
0.043 | 0.013 | 0.019 
0.87 | 0.78 | 0.80 
0.016 | 0.015 | 0.017 


initial 


fixpoint 


8 Conclusion 


We provided an incremental approach to find POMDP policies that satisfy 
almost-sure reachability specifications. The superior scalability is demonstrated 
on a string of benchmarks. Furthermore, this approach allows to shield agents in 
POMDPs and guarantees that any exploration of an environment satisfies the 
specification, without needlessly restricting the freedom of the agent. We plan to 
investigate a tight interaction with state-of-the-art reinforcement learning and 
quantitative verification of POMDPs. For the latter, we expect that an explicit 
approach to model checking the belief-support MDP can be feasible. 
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Abstract. We present a detailed study of roundoff errors in probabilistic 
floating-point computations. We derive closed-form expressions for the 
distribution of roundoff errors associated with a random variable, and 
we prove that roundoff errors are generally close to being uncorrelated 
with their generating distribution. Based on these theoretical advances, 
we propose a model of IEEE floating-point arithmetic for numerical 
expressions with probabilistic inputs and an algorithm for evaluating this 
model. Our algorithm provides rigorous bounds to the output and error 
distributions of arithmetic expressions over random variables, evaluated 
in the presence of roundoff errors. It keeps track of complex dependen- 
cies between random variables using an SMT solver, and is capable of 
providing sound but tight probabilistic bounds to roundoff errors using 
symbolic affine arithmetic. We implemented the algorithm in the PAF 
tool, and evaluated it on FPBench, a standard benchmark suite for the 
analysis of roundoff errors. Our evaluation shows that PAF computes 
tighter bounds than current state-of-the-art on almost all benchmarks. 


1 Introduction 


There are two common sources of randomness in a numerical computation (a 
straight-line program). First, the computation might be using inherently noisy 
data, for example from analog sensors in cyber-physical systems such as robots, 
autonomous vehicles, and drones. A prime example is data from GPS sensors, 
whose error distribution can be described very precisely [2] and which we study in 
some detail in Sect. 2. Second, the computation itself might sample from random 
number generators. Such probabilistic numerical routines, known as Monte-Carlo 
methods, are used in a wide variety of tasks, such as integration [34,42], opti- 
mization [43], finance [25], fluid dynamics [32], and computer graphics [30]. We 


Supported in part by the National Science Foundation awards CCF 1552975, 1704715, 
the Engineering and Physical Sciences Research Council (EP/P010040/1), and the 
Leverhulme Project Grant “Verification of Machine Learning Algorithms”. 

© The Author(s) 2021 


A. Silva and K. R. M. Leino (Eds.): CAV 2021, LNCS 12760, pp. 626-650, 2021. 
https: //doi.org/10.1007/978-3-030-81688-9_29 


Roundoff Error Analysis of Probabilistic Floating-Point Computations 627 


call numerical computations whose input values are sampled from some proba- 
bility distributions probabilistic computations. 

Probabilistic computations are typically implemented using floating-point 
arithmetic, which leads to roundoff errors being introduced in the computation. 
To strike the right balance between the performance and energy consumption 
versus the quality of the computed result, expert programmers rely on either 
a manual or automated floating-point error analysis to guide their design deci- 
sions. However, the current state-of-the-art approaches in this space have pri- 
mary focused on worst-case roundoff error analysis of deterministic computa- 
tions. So what can we say about floating-point roundoff errors in a probabilistic 
context? Is it possible to probabilistically quantify them by computing confidence 
intervals? Can we, for example, say with 99% confidence that the roundoff error 
of the computed result is smaller than some chosen constant? What is the dis- 
tribution of outputs when roundoff errors are taken into account? In this paper, 
we explore these and similar questions. To answer them, we propose a rigorous 
— that is to say sound — approach to quantifying roundoff errors in probabilis- 
tic computations. Based on this approach, we develop an automatic tool that 
efficiently computes an overapproximate probabilistic profile of roundoff errors. 

As an example, consider the floating-point arithmetic expression (X + Y)+Y, 
where X and Y are random inputs represented by independent random variables. 
In Sect. 4, we first show how the computation in finite-precision of a single arith- 
metic operation such as X + Y can be modeled as (X + Y)(1 + £), where e is 
also a random variable. We then show how this random variable can be computed 
from first principles and why it makes sense to view (X + Y) and (1 + £) as inde- 
pendent expressions, which in turn allows us to easily compute the distribution of 
(X + Y)(1 + €). The distribution of € depends on that of X + Y, and we there- 
fore need to evaluate arithmetic operations between random variables. When the 
operands are independent — as in X + Y - this is standard [48], but when the 
operands are dependent — as in the case of the division in (X + Y) +Y - this is a 
hard problem. To solve it, we adopt and improve a technique for soundly bound- 
ing these distributions described in [3]. Our improvement comes from the use of an 
SMT solver to reason about the dependency between (X + Y) and Y and remove 
regions of the state-space with zero probability. We describe this in Sect. 6. 

We can thus soundly bound the output distribution of any probabilistic com- 
putation, such as (X+Y)+Y, performed in floating-point arithmetic. This gives 
us the ability to perform probabilistic range analysis and prove rigorous asser- 
tions like: 99% of the outputs of a floating-point computation are smaller than a 
given constant bound. In order to perform probabilistic roundoff error analysis 
we develop symbolic affine arithmetic in Sect. 5. This technique is combined with 
probabilistic range analysis to compute conditional roundoff errors. Specifically, 
we over-approximate the maximal error conditioned on the output landing in the 
99% range computed by the probabilistic range analysis, meaning conditioned 
on the computations not returning an outlier. 

We implemented our model and algorithms in a tool called PAF (for Prob- 
abilistic Analysis of Floating-point errors). We evaluated PAF on the standard 
floating-point benchmark suite FPBench [11], and compared its range and error 
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analysis with the worst-case roundoff error analyzer FPTaylor [46,47] and the 
probabilistic roundoff error analyzer PrAn [36]. We present the results in Sect. 7, 
and show that FPTaylor’s worst-case analysis is often overly pessimistic in the 
probabilistic setting, while PAF also generates tighter probabilistic error bounds 
than PrAn on almost all benchmarks. 

We summarize our contributions as follows: 


(i) We derive a closed-form expression (6) for the distribution of roundoff errors 
associated with a random variable. We prove that roundoff errors are gen- 
erally close to being uncorrelated with their input distribution. 

(ii) Based on these results we propose a model of IEEE 754 floating-point arith- 
metic for numerical expressions with probabilistic inputs. 

(iii) We evaluate this model by developing a new algorithm for rigorously bound- 
ing the output range and roundoff error distributions of floating-point arith- 
metic expressions with probabilistic inputs. 

(iv) We implement this model in the PAF tool,! and perform probabilistic range 
and roundoff error analysis on a standard benchmark suite. Our comparison 
with the current state-of-the-art shows the advantages of our approach in 
terms of computing tighter, and yet still rigorous, probabilistic bounds. 


2 Motivating Example 


GPS sensors are inherently noisy. Bornholt [1] shows that the conditional prob- 
ability of the true coordinates given a GPS reading is distributed according to a 
Rayleigh distribution. Interestingly, since the density of any Rayleigh distribu- 
tion is always zero at x = 0, it is extremely unlikely that the true coordinates lie 
in a small neighborhood of those given by the GPS reading. This leads to errors, 
and hence the sensed coordinates should be corrected by adding a probabilistic 
error term which, on average, shifts the observed coordinates into an area of high 
probability for the true coordinates [1,2]. The latitude correction is given by: 


TrueLat = GPSLat + ((radius * sin(angle)) x DPERM), (1) 


where radius is Rayleigh distributed, angle uniformly distributed, GPSLat is 
the latitude, and DPERM a constant for converting meters into degrees. 

A developer trying to strike the right balance between resources, such as 
energy consumption or execution time, versus the accuracy of the computation, 
might want to run a rigorous worst-case floating-point analysis tool to determine 
which floating-point format is accurate enough to process GPS signals. This is 
mandatory if the developer requires rigorous error bounds holding with 100% 
certainty. The problem when analyzing a piece of code involving (1) is that the 
Rayleigh distribution has [0, 00) as its support, and any worst-case roundoff error 
analysis will return an infinite error bound in this situation. To get a meaningful 
(numeric) error bound, we need to truncate the support of the distribution. The 
most conservative truncation is [0, maz], where maz is the largest representable 
number (not causing an overflow) at the target floating-point precision format. 


1 PAF is open source and publicly available at https: //github.com/soarlab/paf. 
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Table 1. Roundoff error analysis for the probabilistic latitude correction of (1). 


Precision | Max | FPTaylor | PAF 100% | PAF 99.9999% 
Absolute | Meters 
Double | 10°07 | 4.3e+286 | 4.3e+286 | 4.le—15 | 4.5e—10 
Single | 10°8 |2.1e+26 |2.1e+26 | 3.7e—06 | 4.1e—1 
Half 104 |2.5e-2 | 2.5e—2 2.4e—2 | 2667 


In Table1, we report a detailed roundoff error analysis of (1) implemented 
in IEEE 754 double-, single-, and half-precision formats, with GPSLat set to 
the latitude of the Greenwich observatory. With each floating-point format, we 
associate the range [0, maz] of the truncated Rayleigh distribution. We compute 
worst-case roundoff error bounds for (1) with the state-of-the-art error analyzer 
FPTaylor [47] and with our tool PAF by setting the confidence interval to 100%. 
As expected, the error bounds from the two tools are identical. Finally, we com- 
pute the 99.9999% conditional roundoff error using PAF. This value is an upper 
bound to the roundoff error conditioned on the computation having landed in 
an interval capturing 99.9999% of all possible outputs. Column Absolute gives 
the error in degrees and Meters in meters (1° +111km). 

By looking at the results obtained without our probabilistic error analysis 
(columns FPTaylor and PAF 100%), the developer might erroneously conclude 
that half-precision format is the most appropriate to implement (1) because it 
results in the smallest error bound. However, with the information provided by 
the 99.9999% conditional roundoff error, the developer can see that the average 
error is many orders of magnitude smaller than the worst-case scenarios. Armed 
with this information, the developer can conclude that with a roundoff error of 
roughly 40 cm (4.le—1 ms) when correcting 99.9999% of GPS latitude readings, 
working in single-precision is an adequate compromise between efficiency and 
accuracy of the computation. 

This motivates the innovative concept of probabilistic precision tuning, evolv- 
ed from standard worst-case precision tuning [5, 12], to determine which floating- 
point format is the most appropriate for a given computation. As an example, let 
us do a probabilistic precision tuning exercise for the latitude correction compu- 
tation of (1). We truncate the Rayleigh distribution in the interval [0, 10307], and 
assume we can tolerate up to le—5 roundoff error (roughly 1m). First, we man- 
ually perform worst-case precision tuning using FPTaylor to determine that the 
minimal floating-point format not violating the given error bound needs 1022 man- 
tissa and 11 exponent bits. Such large custom format is prohibitively expensive, 
in particular for devices performing frequent GPS readings, like smartphones or 
smartwatches. Conversely, when we manually perform probabilistic precision tun- 
ing using PAF with a confidence interval set to 99.9999%, we determine we need 
only 22 mantissa and 11 exponent bits. Thanks to PAF, the developer can provide 
a custom confidence interval of interest to the probabilistic precision tuning rou- 
tine to adjust for the extremely unlikely corner cases like the ones we described for 
(1), and ultimately obtain more optimal tuning results. 
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3 Preliminaries 


3.1 Floating-Point Arithmetic 


Given a precision p € N and an exponent range [emin,;€maz] = {n | n € 
NA emin < Nn < Cmax}, we define F(p,emin,€max), or simply F if there is no 
ambiguity, as the set of extended real numbers 


FS fonz (1 + =) 


Elements z = z(s,e,k) € F will be called floating-point representable numbers 
(for the given precision p and exponent range [€min, €maz|) and we will use the 
variable z to represent them. The variable s will be called the sign, the variable 
e the exponent, and the variable k the significand of z(s,e, k). 

Next, we introduce a rounding map Round : R — F that rounds to nearest 
(or to —oo/co for values smaller/greater than the smallest /largest finite element 
of F) and follows any of the IEEE 754 rounding modes in case of a tie. We will 
not worry about which choice is made since the set of mid-points will always have 
probability zero for the distributions we will be working with. All choices are thus 
equivalent, probabilistically speaking, and what happens in a tie can therefore 
be left unspecified. We will denote the extended real line by R £ RU {—o0, oo}. 
The (signed) absolute error function erraps : R — R is defined as: erraps(x) = 


x—Round(). We define the sets |z, z] £ {y € R | Round(y) = Round(z)}. Thus 
if z € F, then |z, z] is the collection of all reals rounding to z. As the reader will 
see, the basic result of Sect. 4 (Eq. (5)) is expressed entirely using the notation 
|z,z| which is parametric in the choice of the Round function. It follows that 
our results apply to rounding modes other that round-to-nearest with minimal 
changes. The relative error function etre : R \ {0} > R is defined by 


s € {0,1},e € [emin, €max],0 < k < a U {—00, 0, co} 


x — Round(z) 


errre lz) = z 


Note that erryei(#) = 1 on [0,0] \ {0}, errra(x) = œ on | — co,—co] and 
errre (£) = —00 on [00,00]. Recall also the fact [26] that —2-+) < errrei(£) < 
27(P+1) outside of |0,0] U | — 00, —00] U 00, 00]. The quantity 2-(+) is usually 
called the unit roundoff and will be denoted by u. 

For 21, 22 € F and op € {+,—, x,+} an (infinite-precision) arithmetic oper- 
ation, the traditional model of IEEE 754 floating-point arithmetic [26,39] states 
that the finite-precision implementation op, of op must satisfy 


21 OPm 22 = (21 Op 22)(1+ ô) ô| < u, (2) 


We leave dealing with subnormal floating-point numbers to future work. The 
model given by Eq. (2) stipulates that the implementation of an arithmetic 
operation can induce a relative error of magnitude at most u. The exact size of 
the error is, however, not specified and Eq. (2) is therefore a non-deterministic 
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model of computation. It follows that numerical analyses based on Eq. (2) must 
consider all possible relative errors 6 and are fundamentally worst-case analyses. 
Since the output of such a program might be the input of another, one should 
also consider non-deterministic inputs, and this is indeed what happens with 
automated tools for roundoff error analysis, such as Daisy [12] or FPTaylor [46, 
47], which require for each variable of the program a (bounded) range of possible 
values in order to perform a worst-case analysis (cf. GPS example in Sect. 2). 
In this paper, we study a model formally similar to Eq. (2), namely 


Z1 OPm 22 = (21 Op 22)(1+ ô) ô ~ dist. (3) 


The difference is that 6 is now distributed according to dist, a probability distribu- 
tion whose support is [—u, u]. In other words, we move from a non-deterministic 
to a probabilistic model of roundoff errors. This is similar to the ‘Monte Carlo 
arithmetic’ of [41], but whilst op. cit. postulates that dist is the uniform distri- 
bution on [—u, u], we compute dist from first principles in Sect. 4. 


3.2 Probability Theory 


To fix the notation and be self-contained, we present some basic notions of 
probability theory which are essential to what follows. 


Cumulative Distribution Functions and Probability Density Func- 
tions. We assume that the reader is (at least intuitively) familiar with the notion 
of a (real) random variable. Given a random variable X we define its Cumulative 
Distribution Function (CDF) as the function c(t) = P[X < t]. If there exists a 
non-negative integrable function d : R — R such that 


c(t) P[X <t] = [awa 


then we call d(t) the Probability Density Function (PDF) of X. If it exists, 
then it can be recovered from the CDF by differentiation d(t) = 2 c(t) by the 
fundamental theorem of calculus. 

Not all random variables have a PDF: consider the random variable which 
takes value 0 with probability 1/2 and value 1 with probability 1/2. For this 
random variable it is impossible to write P [X < t] = f d(t) dt. Instead, we will 
write the distribution of such a variable using the so- a Dirac delta measure 
at 0 and 1 as 1/259 + 1/20). It is possible for a random variable to have a PDF 
covering part of its distribution — its continuous part — and a sum of Dirac 
deltas covering the rest of its distribution — its discrete part. We will encounter 
examples of such random variables in Sect. 4. Finally, if X is a random variable 
and f : R — R is a measurable function, then f(X) is a random variable. In 
particular errea (X) is a random variable which we will describe in Sect. 4. 


Arithmetic on Random Variables. Suppose X,Y are independent random 
variables with PDFs fx and fy, respectively. Using the arithmetic operations we 
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can form new random variables X +Y, X —Y,X x Y, X +Y. The PDFs of these 
new random variables can be expressed as operations on fx and fy, which can 
be found in [48]. It is important to note that these operations are only valid if X 
and Y are assumed to be independent. When an arithmetic expression containing 
variable repetitions is given a random variable interpretation, this independence 
can no longer be assumed. In the expression (X + Y)+Y the sub-term (X +Y) 
can be interpreted by the formulas of [48] if X,Y are independent. However, the 
sub-terms X + Y and Y cannot be interpreted in this way since X + Y and Y 
are clearly not independent random variables. 


Soundly Bounding Probabilities. The constraint that the distribution of 
a random variable must integrate to 1 makes it impossible to order random 
variables in the ‘natural’ way: if P[X € A] < P[Y € A], then P[Y € A‘] < 
P[X € A‘], i.e., we cannot say that X < Y if P[X € A] < P[Y € A]. This 
means that we cannot quantify our probabilistic uncertainty about a random 
variable by sandwiching it between two other random variables as one would do 
with reals or real-valued functions. One solution is to restrict the sets used in 
the comparison, i.e., declare that X < Y iff PLX € A] < P[Y € A] for A ranging 
over a given set of ‘test subsets’. Such an order can be defined by taking as ‘test 
subsets’ the intervals (—oo, x] [44]. This order is called the stochastic order. It 
follows from the definition of the CDF that this order can be defined by simply 
saying that X < Y iff cy < cy, where cx and cy are the CDFs of X and Y, 
respectively. If it is possible to sandwich an unknown random variable X between 
known lower and upper bounds Xiower < X < Xupper using the stochastic order 
then it becomes possible to give sound bounds to the quantities P [X € [a, b]] via 


P[X € [a, Bl] = cx (b) — cx (a) < Cxupper ©) = Xtower (@) 


P-Boxes and DS-Structures. As mentioned above, giving a random variable 
interpretation to an arithmetic expression containing variable repetitions cannot 
be done using the arithmetic of [48]. In fact, these interpretations are in general 
analytically intractable. Hence, a common approach is to give up on soundness 
and approximate such distributions using Monte-Carlo simulations. We use this 
approach in our experiments to assess the quality of our sound results. However, 
we will also provide sound under- and over-approximations of the distribution of 
arithmetic expressions over random variables using the stochastic order discussed 
above. Since Xjower < X < Xupper is equivalent to saying that cx,,,,..(%) < 
ex(x) < €x,,,.,(), the fundamental approximating structure will be a pair of 
CDFs satisfying c1(x) < c2(x). Such a structure is known in the literature as 
a p-box [19], and has already been used in the context of probabilistic roundoff 
errors in related work [3,36]. The data of a p-box is equivalent to a pair of 
sandwiching distributions for the stochastic order. 

A Dempster-Shafer structure (DS-structure) of size N is a collection (i-e., set) 
of interval-probability pairs {([x0, yo], po), ([£1, Y2|,P1),--, (l£ N, Yn], Pn)} where 
ee p; = 1. The intervals in the collection might overlap. One can always 
convert a DS-structure to a p-box and back again [19], but arithmetic operations 
are much easier to perform on DS-structures than on p-boxes ([3]), which is why 
we will use DS-structures in the algorithm described in Sect. 6. 
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4 Distribution of Floating-Point Roundoff Errors 


Our tool PAF computes probabilistic roundoff errors by conditioning the max- 
imization of symbolic affine form (presented in Sect.5) on the output of the 
computation landing in a confidence interval. The purpose of this section is to 
provide the necessary probabilistic tools to compute these intervals. In other 
words, this section provides the foundations of probabilistic range analysis. All 
proofs can be found in the extended version [7]. 


4.1 Derivation of the Distribution of Rounding Errors 


Recall the probabilistic model of Eq. (3) where op is an infinite-precision 
arithmetic operation and op, its finite-precision implementation: 


21 OPm 22 = (21 Op 22)(1+ ô) 6 ~ dist. 


Let us also assume that 21,22 are random variables with known distributions. 
Then z1 op zg is also a random variable which can (in principle) be computed. 
Since the IEEE 754 standard states that z1 op, Z2 is computed by rounding the 
infinite precision operation zı op 22, it is a completely natural consequence of 
the standard to require that 6 is simply be given by 


Ô = etTyel(21 Op 22) 


Thus, dist is the distribution of the random variable err;ei(z1 op 22). More gen- 
erally, if X is a random variable with know distribution, we will show how to 
compute the distribution dist of the random variable 


_ X —Round(X) 
n 


We choose to express the distribution dist of relative errors in multiples of the 
unit roundoff u. This choice is arbitrary, but it allows us to work with a dis- 
tribution on the conceptually and numerically convenient interval [—1, 1], since 
the absolute value of the relative error is strictly bounded by u (see Sect. 3.1), 
rather than the interval [—u, u]. 

To compute the density function of dist, we proceed as described in Sect. 3.2 
by first computing the CDF c(t) and then taking its derivative. Recall first from 
Sect. 3.1 that errye(x) = 1 if x € [0,0] \ {0}, errre(x) = œ if x € | — co, —o0], 
elTrel(“Z) = —œ if x € |oo, co], and —u < errre(x) < u elsewhere. Thus: 


errre (X) 


P [erra (X) = —co] = P [X € |00, 00 |] P [errr (X) = 1] = P [X e |0,0]] 
P [errre (X) = co] = P [X € |- oO, —oo]] 


In other words, the probability measure corresponding to errre] has three discrete 
components at {—oo}, {1}, and {oo}, which cannot be accounted for by a PDF 
(see Sect. 3.2). It follows that the probability measure dist is given by 


diste +P [X €|0,0]] 61 +P [XE| — 00,-00]] 600 + P[X€|00,00]] 8- (4) 
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U(2.0,4.0), KS-test=0,0034, p-val=0.1947 U(2.0,4.0), KS-test=0.0031, p-val=0.2926 U(7,8), KS-test=0.0007, p-val=0.6797 
0.8 
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U(4,5), KS-test=0.0008, p-val=0.6132 U(4,32), KS-test=0.0009, p-val=0.345 N(0.0,1.0), KS-test=0.0007, p-val=0.7447 


0.04 
-1.00 -0.75 -0.50 -0.25 0.00 0.25 050 0.75 1.00 -1.00 -0.75 -0.50 -0.25 0.00 025 050 075 100 -1.00 -0.75 -0.50 -0.25 0.00 0.25 050 075 100 


Fig. 1. Theoretical vs. empirical error distribution, clockwise from top-left: (i) Eq. 
(5) for Unif(2,4) 3 bit exponent, 4 bit significand, (ii) Eq. (5) for Unif(2,4) in half 
precision, (iii) Eq. (6) for Unif(7,8) in single-precision, (iv) Eq. (6) for Unif(4,5) in 
single-precision, (v) Eq. (6) for Unif(4, 32) in single-precision, (vi) Eq. (6) for Norm(0, 1) 
in single-precision. 


where dist. is a continuous measure that is not quite a probability measure 
since its total mass is 1 — P [X e€ |0,0]] — P[X€| — œ, —o0]] — P [X € |œ, ow]. 
In general, diste integrates to 1 in machine precision since P[X e€ |0,0]] is of 
the order of the smallest positive floating-point representable number, and the 
PDF of X rounds to 0 way before it reaches the smallest /largest floating-point 
representable number. However in order to be sound, we must in general include 
these three discrete components to our computations. The density dist, is given 
explicitly by the following result whose proof can already be found in [9]. 


Theorem 1. Let X be a real random variable with PDF f. The continuous part 
dist. of the distribution of errre(X) has a PDF given by 


= D 1a] (5) f € =.) ae (5) 


z€F\{—0o,0,c0o} 


where 14(x) is the indicator function which returns 1 if x € A and 0 otherwise. 


Figure | (i) and (ii) shows an implementation of Eq. (5) applied to the distri- 
bution Unif(2, 4), first in very low precision (3 bit exponent, 4 bit significand) and 
then in half-precision. The theoretical density is plotted alongside a histogram 
of the relative error incurred when rounding 100,000 samples to low precision 
(computed in double-precision). The reported statistic is the K-S (Kolmogorov- 
Smirnov) test which measures the likelihood that a collection of samples were 
drawn from a given distribution. This test reports that we cannot reject the 
hypothesis that the samples are drawn from the corresponding density. Note 
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how in low precision the term in E induces a visible asymmetry on the 


1 
1—tu 
central section of the distribution. This effect is much less pronounced in half- 

precision. 

For low precisions, say up to half-precision, it is computationally feasible 
to explicitly go through all floating-point numbers and compute the density of 
the roundoff error distribution dist directly from Eq. (5). However, this rapidly 
becomes prohibitively computationally expensive for higher precisions (since the 
number of floating-point representable numbers grows exponentially). 


4.2 High-Precision Case 


As the working precision increases, a regime changes occurs: on the one hand 
it becomes practically impossible to enumerate all floating-point representable 
numbers as done in Eq. (5), but on the other hand sufficiently well-behaved den- 
sity functions are numerically close to being constant at the scale of an interval 
between two floating-point representable numbers. We exploit this smoothness 
to overcome the combinatorial limit imposed by Eq. (5). 


Theorem 2. Let X be a real random variable with PDF f. The continuous part 
dist, of the distribution of errra(X) has a PDF given by de(t) = dpp(t) + R(t) 
where dnp(t) is the function on [—1,1] defined by 


aie 52° (2—u) |a 
a > Ji i 1)52¢( a n ae f(z) dx jë] < 4 
dnp(t) = (6) 
emaxz—1 (—1)*2° (4-4) i 
= > Je Dee la) ele f(z) dx l < \t| <1 


5,€=6min tl 


and R(t) is an error whose total contribution |RI£ f RŒ) |dt can be bounded by 


|R| < P[Round(X) = z(s, emin, k)] + P [Round(X) = 2(s, emaz, k)] + 
2e 


al E Wees + FE.) = 


8,€min <€<€max 


where for each exponent e and sign s, Ee s is a point in [z(s,e, 0), z(s, e, 2? — 1)] 
if s = 0 and in [z(s,e,2? — 1), z(s,e,0)] if s =1. 


Note how Eq. (6) reduces the sum over all floating-point representable num- 
bers in Eq. (5) to a sum over the exponents by exploiting the regularity of f. 
Note also that since f is a PDF, it usually decreases very quickly away from 0, 
and its derivative decreases even quicker and |R] thus tends to be very small and 
|R| — 0 as the precision p — oo. 

Figure 1 shows Eq. (6) for: (i) the distribution Unif(7,8) where large signif- 
icands are more likely, (ii) the distribution Unif(4,5) where small significands 
are more likely, (iii) the distribution Unif(4,32) where significands are equally 
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likely, and (iv) the distribution Norm(0, 1) with infinite support. The graphs show 
the density function given by Eq. (6) in single-precision versus a histogram of 
the relative error incurred when rounding 1,000,000 samples to single-precision 
(computed in double-precision). The K-S test reports that we cannot reject the 
hypothesis that the samples are drawn from the corresponding distributions. 


4.3 Typical Distribution 


The distributions depicted in graphs (ii), (v) 
and (vi) of Fig. 1 are very similar, despite being 
computed from very different input distributions. os 
What they have in common is that their input o: 
distributions have the property that all signif- © 
icands in their supports are equally likely. We 
show that under this assumption, the distribution 
of roundoff errors given by Eq. (5) converges to “oe 075 oso -025 abo 025 ošo 075 100 
a unique density as the precision increases, irre- 
spective of the input distribution! Since signifi- 
cands are frequently equiprobable (it is the case for a third of our benchmarks), 
this density is of great practical importance. If one had to choose ‘the’ canonical 
distribution for roundoff errors, we claim that the density given below should be 
this distribution, and we therefore call it the typical distribution; we depict it in 
Fig. 2 and formalize it with the following theorem, which can mostly be found 
in [9]. 


Fig. 2. Typical distribution. 


Theorem 3. If X is a random variable such that P[Round(X) = z(s,e,ko)] = 
5 for any significand ko, then 


lt] < 


inoa teats = 
typ Li oe es 


poco 


(7) 


SS 
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NIe NIe 


where d(t) is the exact density given by Eq. (5). 


4.4 Covariance Structure 


The result above can be interpreted as saying that if X is such that all man- 
tissas are equiprobable, then X and err,eaı( X) are asymptotically independent 
(as p — oo). Much more generally, we now show that if a random variable X 
has a sufficiently regular PDF, it is close to being uncorrelated from errTre( X). 
Formally, we prove that the covariance 


Cov(X, erryei(X)) = E [X-errye)(X)] — E [X] E [errye(X)] (8) 


is small, specifically of the order of u. Note that the expectation in the first 
summand above is taken w.r.t. the joint distribution of X and err;ei(X). 

The main technical obstacles to proving that the expression above is small 
are that E [err;.(X)] turns out to be difficult to compute (we only manage to 
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bound it) and that the joint distribution P [X € AA erryei(X) € B] does not have 
a PDF since it is not continuous w.r.t. the Lebesgue measure on R?. Indeed, it 
is supported by the graph of the function erre which has a Lebesgue measure 
of 0. This does not mean that it is impossible to compute the expectation 


z [X .err;a(X)] = [. xut dP (9) 


but it is necessary to use some more advanced probability theory. We will make 
the simplifying assumption that the density of X is constant on each interval 
|z, z| in order to keep the proof manageable. In practice this is an extremely good 
approximation. Without this assumption, we would need to add an error term 
similar to that of Theorem 2 to the expression below. This is not conceptually 
difficult, but it is messy, and would distract from the main aim of the following 
theorem which is to bound E [errre (X)], compute E [X.errye)(X)], and show that 
the covariance between X and errye)(X) is typically of the order of u. 


Theorem 4. If the density of X is piecewise constant on intervals |z,z]|, then 


(1- IX] KE) < Cov(X, ertyei(X)) < (z- XK) 


sge s92e 3u? Emaz—1 1)°2°(2-u £ 
where L = X f((=1)°2°)(-1)°2 32 and K = "SS a pee a 


If the distribution of X is centered (i.e., E [X] = 0) then L is the exact value of 
the covariance, and it is worth noting that L is fundamentally an artifact of the 
floating-point representation and is due to the fact that the intervals |2°, 2°] are 
not symmetric. More generally, for E[X] of the order of, say, 2, the covariance 
will be small (of the order of u) as K < 1 (since |x| < 2°! in each summand). 
For very large values of E[X] it is worth noting that there is a high chance 
that L is also be very large, partially canceling E[X]. An illustration of this 
is given by the doppler benchmark examined in Sect. 7, an outlier as it has an 
input variable with range [20, 20000]. Nevertheless, even for this benchmark the 
bounds of Theorem 4 still give a small covariance of the order of 0.001. 


4.5 Error Terms and P-Boxes 


In low-precision we can use the exact formula Eq. (5) to compute the error distri- 
bution. However, in high-precision, approximations (typically extremely good) 
like Eqs. (6) and (7) must be used. In order to remain sound in the implemen- 
tation of our model (see Sect.6) we must account for the error made by this 
approximation. We have not got the space to discuss the error made by Eq. (7), 
but taking the term |R| of Theorem 2 as an illustration, we can use the notion 
of p-box described in Sect. 3.2 to create an object which soundly approximates 
the error distribution. We proceed as follows: since |R| bounds the total error 
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accumulated over all t € [—1, 1], we can soundly bound the CDF c(t) of the error 
distribution given by Eq. (6) by using the p-box 


c(t) = max(0, c(t) — |R]) and  c™(t)= min(1, c(t) + |R]) 


5 Symbolic Affine Arithmetic 


In this section, we introduce symbolic affine arithmetic, which we employ to gen- 
erate the symbolic form for the roundoff error that we use in Sect.6.3. Affine 
arithmetic [6] is a model for range analysis that extends classic interval arith- 
metic [40] with information about linear correlations between operands. Sym- 
bolic affine arithmetic extends standard affine arithmetic by keeping the coeffi- 
cients of the noise terms symbolic. We define a symbolic affine form as 


ĉ = To + 5 Ti€i, where e; € [—1, 1]. (10) 


We call xp the central symbol of the affine form, while x; are the symbolic 
coefficients for the noise terms €;. We can always convert a symbolic affine form 
to its corresponding interval representation. This can be done using interval 
arithmetic or, to avoid precision loss, using a global optimizer. 

Affine operations between symbolic forms follow the usual rules, such as 


at + BY +C = azo + By +G +Y (axi + Byes 


i=l 


Non-linear operations cannot be represented exactly using an affine form. Hence, 
we approximate them like in standard affine arithmetic [49]. 


Sound Error Analysis with Symbolic Affine Arithmetic. We now show 
how the roundoff errors get propagated through the four arithmetic operations. 
We apply these propagation rules to an arithmetic expression to accurately keep 
track of the roundoff errors. Since the (absolute) roundoff error directly depends 
on the range of a computation, we describe range and error together as a pair 
(range: Symbol, érr: Symbolic Affine Form). Here, range represents the 
infinite-precision range of the computation, while ér7 is the symbolic affine form 
for the roundoff error in floating-point precision. Unary operators (e.g., rounding) 
take as input a (range, error form) pair, and return a new output pair; binary 
operators take as input two pairs, one per operand. For linear operators, the 
ranges and errors get propagated using the standard rules of affine arithmetic. 

For the multiplication, we distribute each term in the first operand to every 
term in the second operand: 


(x, €rrz) * (Y, Erry) = (x*y, X * Try + y * ETT + ETT a * ETTy) 


The output range is the product of the input ranges and the remaining terms 
contribute to the error. Only the last (quadratic) expression cannot be repre- 
sented exactly in symbolic affine arithmetic; we bound such non-linearities using 
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a global optimizer. The division is computed as the term-wise multiplication of 
the numerator with the inverse of the denominator. Hence, we need the inverse 
of the denominator error form, and then we can proceed as for multiplication. To 
compute the inverse, we leverage the symbolic expansion used in FPTaylor [46]. 

Finally, after every operation we apply the unary rounding operator from 
Eq. (2). The infinite-precision range is not affected by rounding. The rounding 
operator appends a fresh noise term to the symbolic error form. The coefficient 
for the new noise term is the (symbolic) floating-point range given by the sum 
of the input range with the input error form. 


Pe fol | 


Input File f(x) distribution 


abs err distribution | err distribution 


x=Distrib(Range) 
f(x)=Expr 


Fig. 3. Toolflow of PAF. 


6 Algorithm and Implementation 


In this section, we describe our probabilistic model of floating-point arithmetic 
and how we implement it in a prototype named PAF (for Probabilistic Analysis 
of Floating-point errors). Figure 3 shows the toolflow of PAF. 


6.1 Probabilistic Model 


PAF takes as input a text file describing a probabilistic floating-point compu- 
tation and its input distributions. The kinds of computations we support are 
captured with this simple grammar: 


t = Z | xi | t op, t zEF,ieN, opm €{+,-,x,+} 


Following [8,31], we interpret each computation t given by the grammar as a 
random variable. We define the interpretation map [—] over the computation 
tree inductively. The base case is given by [z(s,e,k)] = (—1)°2°(1 + k27?) 
and [x;] = X;, where the real numbers [z(s,e,k)] are understood as constant 
random variables and each X; is a random input variable with a user-specified 
distribution. Currently, PAF supports several well-known distributions out-of- 
the-box (e.g., uniform, normal, exponential), and the user can also define custom 
distributions as piecewise functions. For the inductive case [t1 opm tz], we put 
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the lessons from Sect.4 to work. Recall first the probabilistic model from Eq. 
(3): 

£ OPm Y = (x op y)(14+ ô), ô ~ dist 
In Sect. 4.1, we showed that dist should be taken as the distribution of the actual 
roundoff errors of the random elements (x op y). We therefore define: 


[ti opn ta] = ([ta] op [t2]) x 1 +errrei([ti] op [t2])) (11) 


To evaluate the model of Eq. (11), we first use the appropriate closed-form 
expression Eqs. (5) to (7) derived in Sect. 4 to evaluate the distribution of the 
random variable errye)([t1] op [t2])—or the corresponding p-box as described 
in Sect. 4.5. We then use Theorem 4 to justify evaluating the multiplication oper- 
ation in Eq. (11) independently—that is to say by using [48]—since the roundoff 
process is very close to being uncorrelated to the process generating it. The 
validity of this assumption is also confirmed experimentally by the remarkable 
agreement of Monte-Carlo simulations with this analytical model. 

We now introduce the algorithm for evaluating the model given in Eq. (11). 
The evaluation performs an in-order (LNR) traversal of the Abstract Syntax 
Tree (AST) of a computation given by our grammar, and it feeds the results 
to the parent level along the way. At each node, it computes the probabilistic 
range of the intermediate result using the probabilistic ranges computed for its 
children nodes (i.e., operands). We first determine whether the operands are 
independent or not (Ind? branch in the toolflow), and we either apply a cheaper 
(i.e., no SMT solver invocations) algorithm if they are independent (see below) or 
a more involved one (see Sect. 6.2) if they are not. We describe our methodology 
at a generic intermediate computation in the AST of the expression. 

We consider two distributions X and Y discretized into DS-structures DSx 
and DSy (Sect.3.2), and we want to derive the DS-structure DSz for Z = 
X op Y, op € {+,—, x, +}. Together with the DS-structures of the operands, we 
also need the traces tracex and tracey containing the history of the operations 
performed so far, one for each operand. A trace is constructed at each leaf of the 
AST with the input distributions and their range. It is then propagated to the 
parent level and populated at each node with the current operation. Such history 
traces are critical when dealing with dependent operations since they allow us 
to interrogate an SMT solver about the feasibility of the current operation, as 
we describe in the next section. When the operands are independent, we simply 
use the arithmetic operations on independent DS-structures [3]. 


6.2 Computing Probabilistic Ranges for Dependent Operands 


When the operands are dependent, we start by assuming that the dependency is 
unknown. This assumption is sound because the dependency of the operation is 
included in the set of unknown dependencies, while the result of the operation is 
no longer a single distribution but a p-box. Due to this “unknown assumption”, 
the CDFs of the output p-box are a very pessimistic over-approximation of 
the operation, i.e., they are far from each other. Our key insight is to use an 
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Algorithm 1. Dependent Operation Z = X op Y 


1: function DEP_OP(DSx, op , DSy, tracex, tracey) 

2: DSz = list() 

3 for all ([x1, £2], pr) € DSx do 

4 for all ([y1, y2], py) € DSy do 

5: [21,22] = [21, £2] op [y1, y2] > operation between intervals 
6: [21,22] = SMT.prune([z1, 22]) 

T: if SMT.check(tracex ^ tracey A |zx1, x2] A [y1, y2]) is SAT then 
8 pz = unknown-probability 

9: else 

10: pz=0 

11: DSz.append((|zi, z2], pz)) 

12: tracez = tracex U tracey U {Z = X op Y} 

13: return DSz,tracez 


SMT solver to prune infeasible combinations of intervals from the input DS- 
structures, which prunes regions of zero probability from the output p-box. This 
probabilistic pruning using a solver squeezes together the CDFs of the output 
p-box, often resulting in a much more accurate over-approximation. With the 
solver, we move from an unknown to a partially known dependency between the 
operands. Currently, PAF supports the Z3 [17] and dReal [23] SMT solvers. 

Algorithm 1 shows the pseudocode of our algorithm for computing the proba- 
bilistic output range (i.e., DS-structure) for dependent operands. When dealing 
with dependent operands, interval arithmetic (line 5) might not be as precise 
as in the independent case. Hence, we use an SMT solver to prune away any 
over-approximations introduced by interval arithmetic when computing with 
dependent ranges (line 6); this use of the solver is orthogonal to the one dealing 
with probabilities. On line 7, we check with an SMT solver whether the current 
combination of ranges [21,72] and [y1, y2] is compatible with the traces of the 
operands. If the query is satisfiable, the probability is strictly greater than zero 
but currently unknown (line 8). If the query is unsatisfiable, we assign a proba- 
bility of zero to the range in DSz (line 10). Finally, we append a new range to 
the DS-structure DSz (line 11). Note that the loops are independent, and hence 
in our prototype implementation we run them in parallel. 

After this algorithm terminates, we still need to assign probability values to 
all the unknown-probability ranges in DS. Since we cannot assign an exact 
value, we compute a range of potential values [pz,,,,.,Pzn.,|] instead. This com- 
putation is encoded as a linear programming routine exactly as in [3]. 


6.3 Computing Conditional Roundoff Error 


The final step of our toolflow computes the conditional roundoff error by com- 
bining the symbolic affine arithmetic error form of the computation (see Sect. 5) 
with the probabilistic range analysis described above. The symbolic error form 
gets maximized conditioned on the results of all the intermediate operations 
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Algorithm 2. Conditional Roundoff Error Computation 


1: function COND_ERR(DSS, error Form, con fidence) 

2 allRanges = list() 

3 for all DS; € DSS do 

4: focals = sorted(DS;, key = prob, order = descending) 
5: accumulator = 0 

6: ranges = Ø 

7 for all ([z1, £2], pz) € focals do 

8 accumulator = accumulator + pz 


9: ranges = ranges U [x1, £2] 

10: if accumulator > confidence then 
11: allRanges.append(ranges) 

12: break 

13: error = maximize(error Form, allRanges) 
14: return error 


landing in the given confidence interval (e.g., 99%) of their respective ranges 
(computed as described in the previous section). Note that conditioning only on 
the last operation of the computation tree (i.e., the AST root) would lead to 
extremely pessimistic over-approximation since all the outliers in the intermedi- 
ate operations would be part of the maximization routine. This would lead to our 
tool PAF computing pessimistic error bounds typical of worst-case analyzers. 
Algorithm 2 shows the pseudocode of the roundoff error computation algo- 
rithm. The algorithm takes as input a list DSS of DS-structures (one for each 
intermediate result range in the computation), the generated symbolic error 
form, and a confidence interval. It iterates over all intermediate DS-structures 
(line 3), and for each it determines the ranges needed to support the chosen confi- 
dence intervals (lines 4-12). In each iteration, it sorts the list of range-probability 
pairs (i.e., focal elements) of the current DS-structure by their probability value 
in a descending order (line 4). This is a heuristic that prioritizes the focal ele- 
ments with most of the probability mass and avoids the unlikely outliers that 
cause large roundoff errors into the final error computation. With the help of an 
accumulator (line 8), we keep collecting focal elements (line 9) until the accumu- 
lated probability satisfies the confidence interval (line 10). Finally, we maximize 
the error form conditioned to the collected ranges of intermediate operations (line 
13). The maximization is done using the rigorous global optimizer Gelpia [24]. 


7 Experimental Evaluation 


We evaluate PAF (version 1.0.0) on the standard FPBench benchmark suite [11, 
20] that uses the four basic operations we currently support {+,—, x, +}. Many 
of these benchmarks were also used in recent related work [36] that we compare 
against. The benchmarks come from a variety of domains: embedded software 
(bsplines), linear classifications (classids), physics computations (dopplers), fil- 
ters (filters), controllers (traincars, rigidBody), polynomial approximations of 
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functions (sine, sqrt), solving equations (solvecubic), and global optimizations 
(trids). Since FPBench has been primarily used for worst-case roundoff error 
analysis, the benchmarks come with ranges for input variables, but they do 
not specify input distributions. We instantiate the benchmarks with three well- 
known distributions for all the inputs: uniform, standard normal distribution, 
and double exponential (Laplace) distribution with ø = 0.01 which we will call 
‘exp’. The normal and exp distributions get truncated to the given range. We 
assume single-precision floating-point format for all operands and operations. 

To assess the accuracy and performance of PAF, we compare it with PrAn 
(commit 7611679 [10]), the current state-of-the-art tool for automated analysis 
of probabilistic roundoff errors [36]. PrAn currently supports only uniform and 
normal distributions. We run all 6 tool configurations and report the best result 
for each benchmark. We fix the number of intervals in each discretization to 50 to 
match PrAn. We choose 99% as the confidence interval for the computation of our 
conditional roundoff error (Sect.6.3) and of PrAn’s probabilistic error. We also 
compare our probabilistic error bounds against FP Taylor (commit efbbc83 [21]), 
which performs worst-case roundoff error analysis, and hence it does not take 
into account the distributions of the input variables. We ran our experiments in 
parallel on a 4-socket 2.2 GHz 8-core Intel Xeon E5-4620 machine. 

Table2 compares roundoff errors reported by PAF, PrAn, and FPTaylor. 
PAF outperforms PrAn by computing tighter probabilistic error bounds on 
almost all benchmarks, occasionally by orders of magnitude. In the case of uni- 
form input distributions, PAF provides tighter bounds for 24 out of 27 bench- 
marks, for 2 benchmarks the bounds from PrAn are tighter, while for sqrt they 
are the same. In the case of normal input distributions, PAF provides tighter 
bounds for all the benchmarks. Unlike PrAn, PAF supports probabilistic output 
range analysis as well. We present these results in the extended version [7]. 

In Table 2, of particular interest are benchmarks (10 for normal and 18 for 
exp) where the error bounds generated by PAF for the 99% confidence interval 
are at least an order of magnitude tighter than the worst-case bounds generated 
by FPTaylor. For such a benchmark and input distribution, PA F’s results inform 
a user that there is an opportunity to optimize the benchmark (e.g., by reducing 
precision of floating-point operations) if their use-case can handle at most 1% of 
inputs generating roundoff errors that exceed a user-provided bound. FPTaylor’s 
results, on the other hand, do not allow for a user to explore such fine-grained 
trade-offs since they are worst-case and do not take probabilities into account. 

In general, we see a gradual reduction of the errors transitioning from uniform 
to normal to exp. When the input distributions are uniform, there is a significant 
chance of generating a roundoff error of the same order of magnitude as the worst- 
case error, since all inputs are equally likely. The standard normal distribution 
concentrates more than 99% of probability mass in the interval [—3, 3], resulting 
in the long tail phenomenon, where less than 0.5% of mass spreads in the interval 
[3, co]. When the normal distribution gets truncated in a neighborhood of zero 
(e.g., [0,1] for bsplines and filters) nothing changes with respect to the uniform 
case—there is still a high chance of committing errors close to the worst-case. 
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Table 2. Roundoff error bounds reported by PAF, PrAn, and FPTaylor given uniform 
(uni), normal (norm), and Laplace (exp) input distributions. We set the confidence 
interval to 99% for PAF and PrAn, and mark the smallest reported roundoff errors for 
each benchmark in bold. Asterisk (*) highlights a difference of more than one order of 
magnitude between PAF and FPTaylor. 


Benchmark | Uniform Normal Exp FpTaylor 
PAF PrAn PAF PrAn PAF 
bsplineO 5.71le—08 | 6.12e—08 | 5.71le—08 | 6.12e—08|5.71le—08 | 5.72e—08 
bsplinel 1.86e—07 2.08e—07 | 1.86e—07 | 2.08e—07 | 6.95e—08 | 1.93e—07 
bspline2 1.94e—07 | 2.13e—07 |1.94e—07 | 2.13e—07 | 2.11e—08  2.10e—07 
bspline3 4.22e—08 4.65e—08 | 4.22e—08 | 4.65e—08 | 7.62e—12* | 4.22e—08 
classidsO 6.93e—06 8.65e—06 | 4.45e—06 | 8.64e—06|1.70e—06 | 6.85e—06 
classids1 3.71e—06 4.63e—06 | 2.68e—06 | 4.62e—06 | 7.62e—O7 | 3.62e—06 
classids2 5.23e—06 7.32e—06 | 3.85e—06 | 7.32e—06|1.46e—06 | 5.15e—06 
doppler1 7.95e—05 1.17e—04 | 5.08e—07* | 1.17e—04 | 4.87e—07* | 6.10e—05 
doppler2 1.43e—04 | 2.45e—04 | 6.61e—07* | 2.45e—04 | 6.28e—07* | 1.11le—04 
doppler3 4.55e—05 5.12e—05 | 9.11le—07* | 5.12e—05 | 8.95e—07* | 3.41e—05 
filter1 1.25e—07 2.03e—07 |1.25e—07 | 2.03e—07 | 5.43e—09* | 1.25e—07 
filter2 7.93e—07 1.0le—06 | 6.13e—07 | 1.01e—06 | 2.90e—08* | 7.93e—07 
filter3 2.34e—06 2.86e—06 | 2.05e—06 | 2.87e—06 | 1.09e—07* | 2.23e—06 
filter4 4.15e—06 5.20e—06 | 4.15e—06 | 5.20e—06 | 4.61e—O07 |3.81e—06 
rigidbodyl |1.74e—04 1.58e—04 | 6.14e-06* | 1.58e—04 | 4.80e—07* | 1.58e—04 
rigidbody2 |1.96e—02 9.70e—03 | 5.99e-05* | 9.70e—03 | 9.55e—07* | 1.94e—02 
sine 2.37e—O7 | 2.40e—07 | 2.37e—O7 | 2.40ce—07 | 1.49e—08* | 2.38e—07 
solvecubic | 1.78e—05 | 1.83e—05 | 6.84e—06 | 1.83e—05 | 2.76e—06  1.60e—05 
sqrt 1.54e—04 | 1.54e-04 | 1.10e—06* | 1.54e—04 | 2.46e—07* | 1.51e—04 
traincarsl |1.76e—03 1.96e—03 | 8.26e—04 | 1.96e—03 | 4.50e—04 | 1.74e—03 
traincars2 |1.04e—03 1.36e—03 | 3.61e—04 | 1.36e—03 | 2.83e—05* | 9.46e—04 
traincars3 | 1.75e—02 2.29e—02 | 9.56e—03 | 2.29e—02 | 8.95e—04* | 1.80e—02 
traincars4 |1.81e—01 2.30e—01 | 8.87e—02 | 2.30e—01 | '7.33e—03* 1.81e—01 
trid1 6.01e—03 6.03e—03 | 1.58e—05* | 6.03e—03 | 1.58e—05* | 6.06e—03 
trid2 1.03e—02 | 1.17e—02 | 2.42e—05* | 1.17e—02 | 2.43e—05*  1.03e—02 
trid3 1.75e—02 | 1.95e—02 | 6.80e—05* | 1.95e—02 | 6.77e—05* | 1.75e—02 
trid4 2.69e—02 | 2.88e—02 | 2.64e—04* | 3.03e—02 | 2.64e—04* | 2.66e—02 


However, when the normal distribution gets truncated to a wider range (e.g., 
[—100, 100] for trids), then the outliers causing large errors are very rare events, 
not included in the 99% confidence interval. The exponential distribution further 
compresses the 99% probability mass in the tiny interval [—0.01, 0.01], so the long 
tails effect is common among all the benchmarks. 
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Fig. 4. CDFs of the range (left) and error (right) distributions for the benchmark 
traincars3 for uniform (top), normal (center), and exp (bottom). 


The runtimes of PAF vary between 10min for small benchmarks, such as 
bsplines, to several hours for benchmarks with more than 30 operations, such 
as trid4; they are always less than two hours, except for trids with 11h and 
filters with 6h. The runtime of PAF is usually dominated by Z3 invocations, 
and the long runtimes are caused by numerous Z3 timeouts that the respective 
benchmarks induce. The runtimes of PrAn are comparable to PAF since they 
are always less than two hours, except for trids with 3h, sqrt with 3h, and sine 
with 11h. Note that neither PAF nor PrAn are memory intensive. 

To assess the quality of our rigorous (i.e., sound) results, we implement Monte 
Carlo sampling to generate both roundoff error and output range distributions. 
The procedure consists of randomly sampling from the provided input distribu- 
tions, evaluating the floating-point computation in both the specified and high- 
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precision (e.g., double-precision) floating-point regimes to measure the roundoff 
error, and finally partitioning the computed errors into bins to get an approx- 
imation (i.e., histogram) of the PDF. Of course, Monte Carlo sampling does 
not provide rigorous bounds, but is a useful tool to assess how far the rigorous 
bounds computed statically by PAF are from an empirical measure of the error. 

Figure 4 shows the effects of the input distributions on the output and round- 
off error ranges of the traincars3 benchmark. In the error graphs (right column), 
we show the Monte Carlo sampling evaluation (yellow line) together with the 
error bounds from PAF with 99% confidence interval (red plus symbol) and 
FPTaylor’s worst-case bounds (green crossmark). In the range graphs (left col- 
umn), we also plot PAF’s p-box over-approximations. We can observe that in the 
case of uniform inputs the computed p-boxes overlap at the extrema of the out- 
put range. This phenomenon makes it impossible to distinguish between 99% and 
100% confidence intervals, and hence as expected the bound reported by PAF is 
almost identical to FPTaylor’s. This is not the case for normal and exponential 
distributions, where PAF can significantly improve both the output range and 
error bounds over FPTaylor. This again illustrates how pessimistic the bounds 
from worst-case tools can be when the information about the input distributions 
is not taken into account. Finally, the graphs illustrate how the p-boxes and 
error bounds from PAF follow their respective empirical estimations. 


8 Related Work 


Our work draws inspiration from probabilistic affine arithmetic [3,4], which aims 
to bound probabilistic uncertainty propagated through a computation; a similar 
goal to our probabilistic range analysis. This was recently extended to polyno- 
mial dependencies [45]. On the other hand, PAF detects any non-linear depen- 
dency supported by the SMT solver. While these approaches show how to bound 
moments, we do not consider moments but instead compute conditional roundoff 
error bounds, a concern specific to the analysis of floating-point computations. 
Finally, the concentration of measure inequalities [4,45] provides bounds for (pos- 
sibly very large) problems that can be expressed as sums of random variables, 
for example multiple increments of a noisy dynamical system, but are unsuitable 
for typical floating-point computations (such as FPBench benchmarks). 

The most similar approach to our work is the recent static probabilistic 
roundoff error analysis called PrAn [36]. PrAn also builds on [3], and inherits the 
same limitations in dealing with dependent operations. Like us, PrAn hinges on 
a discretization scheme that builds p-boxes for both the input and error distribu- 
tions and propagates them through the computation. The question of how these 
p-boxes are chosen is left open in the PrAn approach. In contrast, we take the 
input variables to be user-specified random variables, and show how the distri- 
bution of each error term can be computed directly and exactly from the random 
variables generating it (Sect.4). Furthermore, unlike PrAn, PAF leverages the 
non-correlation between random variables and the corresponding error distribu- 
tion (Sect. 4.4). Thus, PAF performs the rounding in Eq. (3) as an independent 
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operation. Putting these together leads to PAF computing tighter probabilistic 
roundoff error bounds than PrAn, as our experiments show (Sect. 7). 

The idea of using a probabilistic model of rounding errors to analyze deter- 
ministic computations can be traced back to Von Neumann and Goldstine [51]. 
Parker’s so-called ‘Monte Carlo arithmetic’ [41] is probably the most detailed 
description of this approach. We, however, consider probabilistic computations. 
For this reason, the famous critique of the probabilistic approach to roundoff 
errors [29] does not apply to this work. Our preliminary report [9] presents some 
early ideas behind this work, including Eqs. (5) and (7) and a very rudimentary 
range analysis. However, this early work manipulated distributions unsoundly, 
could not handle any repeated variables, and did not provide any roundoff error 
analysis. Recently, probabilistic roundoff error models have also been investi- 
gated using the concentration of measure inequalities [27,28]. Interestingly, this 
means that the distribution of errors in Eq. (3) can be left almost completely 
unspecified. However, as in the case of related work from the beginning of this 
section [4,45], concentration inequalities are very ill-suited to the applications 
captured by the FPBench benchmark suite. 

Worst-case analysis of roundoff errors has been an active research area with 
numerous published approaches [12-16,18, 22,33, 35,37,38,46,47,50]. Our sym- 
bolic affine arithmetic used in PAF (Sect.5) evolved from rigorous affine arith- 
metic [14] by keeping the coefficients of the noise terms symbolic, which often 
leads to improved precision. These symbolic terms are very similar to the first- 
order Taylor approximations of the roundoff error expressions used in FPTay- 
lor [46,47]. Hence, PAF with the 100% confidence interval leads to the same 
worst-case roundoff error bounds as computed by FPTaylor (Sect. 7). 
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Abstract. We study reinforcement learning for the optimal control of 
Branching Markov Decision Processes (BMDPs), a natural extension of 
(multitype) Branching Markov Chains (BMCs). The state of a (discrete- 
time) BMCs is a collection of entities of various types that, while 
spawning other entities, generate a payoff. In comparison with BMCs, 
where the evolution of a each entity of the same type follows the same 
probabilistic pattern, BMDPs allow an external controller to pick from 
a range of options. This permits us to study the best/worst behaviour of 
the system. We generalise model-free reinforcement learning techniques 
to compute an optimal control strategy of an unknown BMDP in the 
limit. We present results of an implementation that demonstrate the 
practicality of the approach. 


1 Introduction 


Branching Markov Chains (BMCs), also known as Branching Processes, are 
natural models of population dynamics and parallel processes. The state of a 
BMC consists of entities of various types, and many entities of the same type 
may coexist. Each entity can branch in a single step into a (possibly empty) set 
of entities of various types while disappearing itself. This assumption is natural, 
for instance, for annual plants that reproduce only at a specific time of the year, 
or for bacteria, which either split or die. An entity may spawn a copy of itself, 
thereby simulating the continuation of its existence. 

The offspring of an entity is chosen at random among options according to a 
distribution that depends on the type of the entity. The type captures significant 
differences between entities. For example, stem cells are very different from 
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regular cells; parallel processes may be interruptible or have different privileges. 
The type may reflect characteristics of the entities such as their age or size. 

Although entities coexist, the BMC model assumes that there is no 
interaction between them. Thus, how an entity reproduces and for how long 
it lives is the same as if it were the only entity in the system. This assumption 
greatly improves the computational complexity of the analysis of such models 
and is appropriate when the population exists in an environment that has 
virtually unlimited resources to sustain its growth. This is a common situation 
that holds when a species has just been introduced into an environment, in an 
early stage of an epidemic outbreak, or when running jobs in cloud computing. 

BMCs have a wide range of applications in modelling various physical 
phenomena, such as nuclear chain reactions, red blood cell formation, population 
genetics, population migration, epidemic outbreaks, and molecular biology. 
Many examples of BMC models used in biological systems are discussed in [12]. 

Branching Markov Decision Processes (BMDPs) extend BMCs by allowing 
a controller to choose the branching dynamics for each entity. This choice is 
modelled as nondeterministic, instead of random. This extension is analogous to 
how Markov Decision Processes (MDPs) generalise Markov chains (MCs) [24]. 
Allowing an external controller to select a mode of branching allows us to study 
the best/worst behaviour of the examined model. 

As a motivating example, let us discuss a simple model of cloud computing. A 
computation may be divided into tasks in order to finish it faster, as each server 
may have different computational power. Since the computation of each task 
depends on the previous one, the total running time is the sum of the running 
times of each spawned task as well as the time needed to split and merge the 
result of each computation into the final solution. As we shall see, the execution 
of each task is not guaranteed to be successful and is subject to random delays. 
Specifically, let us consider the following model with two different types (T and 
S), and two actions (a, and a2). This BMDP consists of the main task, T, that 
may be split (action a,) into three smaller tasks, for simplicity assumed to be 
of the same type S, and this split and merger of the intermediate results takes 
lhour (1h). Alternatively (action a2), we can execute the whole task T on the 
main server, but it will be slow (8h). Task S can (action a1) be run on a reliable 
server in 1.6h or (action a2) an unreliable one that finishes after 1h (irrespective 
of whether or not the computation is completed successfully), but with a 40% 
chance we need to rerun this task due to the server crashing. We can represent 
this model formally as: 


T “4 $s [1h] Cee [1.6h] 
T>e [Sh] S Æ> 40% : S or 60% : € [1h] 


We would like to know the infimum of the expected running time (i.e. the 
expected running time when optimal decisions are made) of task T. In this case 
the optimal control is to pick action a, first and then actions a, for all tasks 
S with a total running time of 5.8h. The expected running time when picking 
actions a2 for S instead would be 1 + 3-1/0.6 = 6 [hours]. 
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Let us now assume that the execution of tasks S for action a; may be 
interrupted with probability 30% by a task of higher priority (type H). Moreover, 
these H tasks may be further interrupted by tasks with even higher priority (to 
simplify matters, again modelled by type H). The computation time of T is 
prolonged by 0.1h for each H spawned. Our model then becomes: 


T = SSS jih] S + 30%:H or 70%:e€ [16h] H > 30%: HH or 
Te [8h] S + 40% : S or 60% : € [1h] 70% : e€ [0.1h] 


As we shall see, the expected total running time of H can be calculated by 
solving the equation x = 0.3(x + x) + 0.1, which gives x = 0.25 [hour]. So the 
expected running time of S using action a, increases by 0.3 -0.25 = 0.075 [hour]. 
This is enough for the optimal strategy of running S to become ag. Note that if 
the probability of H being interrupted is at least 50% then the expected running 
time of H becomes oo. 

When dealing with a real-life process, it is hard to come up with a 
(probabilistic and controlled) model that approximates it well. This requires 
experts to analyse all possible scenarios and estimate the probability of outcomes 
in response to actions based on either complex calculations or the statistical 
analysis of sufficient observational data. For instance, it is hard to estimate the 
probability of an interrupt H occurring in the model above without knowing 
which server will run the task, its usual workload and statistics regarding the 
priorities of the tasks it executes. Even if we do this estimation well, unexpected 
or rare events may happen that would require us to recalibrate the model as we 
observe the system under our control. 

Instead of building such a model explicitly first and fixing the probabilities 
of possible transitions in the system based on our knowledge of the system or 
its statistics, we advocate the use of reinforcement learning (RL) techniques [27] 
that were successfully applied to finding optimal control for finite-state Markov 
Decision Processes (MDPs). Q-learning [30] is a well-studied model-free RL 
approach to compute an optimal control strategy without knowing about the 
model apart from its initial state and the set of actions available in each of 
its states. It also has the advantage that the learning process converges to the 
optimal control while exploiting along the way what it already knows. While the 
formulation of the Q-learning algorithm for BMDPs is straightforward, the proof 
that it works is not. This is because, unlike the MDPs with discounted rewards 
for which the original Q-learning algorithm was defined, our model does not have 
an explicit contraction in each step, nor does boundedness of the optimal values 
or one-step updates hold. Similarly, one cannot generalise the result from [11] 
that estimates the time needed for the Q-learning algorithm to converge within 
c of the optimal values with high probability for finite-state MDPs. 


1.1 Related Work 


The simplest model of BMCs are Galton-Watson processes [31], discrete-time 
models where all entities are of the same type. They date as far back as 1845 [14] 
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and were used to explain why some aristocratic family surnames became extinct. 
The generalisation of this model to multiple types of entities was first studied 
in 1940s by Kolmogorov and Sevast’yanov [17]. For an overview of the results 
known for BMCs, see e.g. [13] and [12]. The precise computational complexity 
of decision problems about the probabilities of extinction of an arbitrary BMC 
was first established in [9]. The problem of checking if a given BMC terminates 
almost surely was shown in [5] to be strongly polynomial. The probability of 
acceptance of a run of a BMC by a deterministic parity tree automaton was 
studied in [4] and shown to be computable in PSPACE and in polynomial time 
for probabilities 0 or 1. In [16] a generalisation of the BMCs was considered that 
allowed for limited synchronisation of different tasks. 

BMDPs, a natural generalisation of BMCs to a controlled setting, have been 
studied in the OR literature e.g., [23,26]. Hierarchical MDPs (HMDPs) [10] 
are a special case of BMDPs where there are no cycles in the offspring graph 
(equivalently, no cyclic dependency between types). BMDPs and HMDPs have 
found applications in manpower planning [29], controlled queuing networks [2, 
15], management of livestock [20], and epidemic control [1,25], among others. The 
focus of these works was on optimising the expected average, or the discounted 
reward over a run of the process, or optimising the population growth rate. 
In [10] the decision problem whether the optimal probability of termination 
exceeds a threshold was studied: it was shown to be solvable in PSPACE and 
at least as hard as the square-root sum problem, but one can determine if the 
optimal probability is 0 or 1 in polynomial time. In [7], it was shown that the 
approximation of the optimal probability of extinction for BMDPs can be done 
in polynomial time. The computational complexity of computing the optimal 
expected total cost before extinction for BMDPs follows from [8] and was shown 
there to be computable in polynomial time via a linear program formulation. 
The problem of maximising the probability of reaching a state with an entity of 
a given type for BMDPs was studied in [6]. In [28] an extension of BMDPs with 
real-valued clocks and timing constraints on productions was studied. 


1.2 Summary of the Results 


We show that an adaptation of the Q-learning algorithm converges almost surely 
to the optimal values for BMDPs under mild conditions: all costs are positive 
and each Q-value is selected for update independently at random. We have 
implemented the proposed algorithm in the tool MUNGOJERRIE [21] and tested 
its performance on small examples to demonstrate its efficiency in practice. To 
the best of our knowledge, this is the first time model-free RL has been used for 
the analysis of BMDPs. 


2 Problem Definitions 


2.1 Preliminaries 


We denote by N the set of non-negative integers, by R the set of reals, by 
R+ the set of positive reals, and by R>o the set of non-negative reals. We let 
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R+ = R U {oo}, and Rso = Rso U {co}. We denote by |X| the cardinality 
of a set X and by X* (X”) the set of all possible finite (infinite) sequences of 
elements of X. Finite sequences are also called lists. 


Vectors and Lists. We use Z,¥,¢ to denote vectors and Z; or (i) to denote its 
i-th entry. We let 0 denote a vector with all entries equal to 0; its size may vary 
depending on the context. Likewise 1 is a vector with all entries equal to 1. For 
vectors 7,9 E€ R&,, Z < y means x; < y; for every i, and Z < 7 means Z < y and 
xi # yi for some i. We also make use of the infinity norm ||Z||,. = max; |Z(é)]. 
We use a, 3,7 to denote finite lists of elements. For a list a = aj, a2,..., Gx 
we write a; for the i-th element a; of list a and |a| for its length. For two lists 
a and 8 we write a- B for their concatenation. The empty list is denoted by e€. 


Probability Distributions. A finite discrete probability distribution over a 
countable set Q is a function u : Q—[0,1] such that ` eọ M(q)=1 and its 
support set supp(u)= {q € Q| u(q)>0} is finite. We say that u € D(Q) is a 
point distribution if u(q)=1 for some q € Q. 


Markov Decision Processes. Markov decision processes [24], are a well-studied 
formalism for systems exhibiting nondeterministic and probabilistic behaviour. 


Definition 1. A Markov decision process (MDP) is a tuple M = (S,A,p,c) 
where: 


— § is the set of states; 

— A is the set of actions; 

- p: Sx A— D(S) is a partial function called the probabilistic transition 
function; and 

- c: S x A—R is the cost function. 


We say that an MDP M is finite (discrete) if both S and A are finite 
(countable). We write A(s) for the set of actions available at s, i.e., the set 
of actions a for which p(s,a) is defined. In an MDP M, if the current state is 
s, then one of the actions in A(s) is chosen nondeterministically. If the chosen 
action is a then the probability of reaching state s’ € S in the next step is 
p(s, a)(s’) and the cost incurred is c(s, a). 


2.2 Branching Markov Decision Processes 
We are now ready to define (multitype) BMDPs. 


Definition 2. A branching Markov decision process (BMDP) is a tuple B = 
(P, A, p,c) where: 


- P is a finite set of types; 

— A is a finite set of actions; 

- p: Px A — D(P*) is a partial function called the probabilistic transition 
function where every D(-) is a finite discrete probability distribution; and 
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- c: P x A — R, is the cost function. 


We write A(q) for the set of actions available to an entity of type q € P, i.e., the 
set of actions a for which p(q,a) is defined. A Branching Markov Chain (BMC) 
is simply a BMDP with just one action available for each type. 

Let us first describe informally how BMDPs evolve. A state of a BMDP B 
is a list of elements of P that we call entities. A BMDP starts at some initial 
configuration, a? € P*, and the controller picks for one of the entities one of the 
actions available to an entity of its type. In the new configuration at, this one 
entity is replaced by the list of new entities that it spawned. This list is picked 
according to the probability distribution p(q, a) that depends both on the type 
of the entity, g, and the action, a, performed on it by the controller. The process 
proceeds in the same manner from at, moving to a”, and from there to a’, ete. 
Once the state € is reached, i.e., when no entities are present in the system, the 
process stays in that state forever. 


Definition 3 (Semantics of BMDP). The semantics of a BMDP B = 
(P, A,p,c) is an MDP Mg = (Statesg, Actionsg, Probg, Costg) where: 


— Statesg = P* is the set of states; 

- Actionsg = N x A is the set of actions; 

- Probg : Statesg x Actionsg — D(Statesg) is the probabilistic transition 
function such that, for a € Statesg and (i,a) E€ Actionsg, we have that 
Probg(a, (i,a)) is defined when i < |a| and a E€ A(a;); moreover 


Probg(a, (i,a))(a1...aj-1 + B+ ai41...) = plai, a)(B), 


for every 3 € P* and 0 in all other cases. 
— Costg : Statesg x Actionsg — R, is the cost function such that 


Costg(a, (i,a)) = c(aj, a). 


For a given BMDP B and states a € Statesg, we denote by Actionsg(a) the 
set of actions (i,a) E€ Actionsg, for which Probg(a, (i,a)) is defined. 

Note that our semantics of BMDPs assumes an explicit listing of all the 
entities in a particular order similar to [10]. One could, instead, define this as 
a multi-set or simply a vector just counting the number of occurrences of each 
entity as in [23]. As argued in [10], all these models are equivalent to each other. 
Furthermore, we assume that the controller expands a single entity of his choice 
at the time rather all of them being expanded simultaneously. As argued in [32], 
that makes no difference for the optimal values of the expected total cost that 
we study in this paper, provided that all transitions’ costs are positive. 


2.3 Strategies 
A path of a BMDP B is a finite or infinite sequence 


T= a, ((41, 41), a’), ((i2, a2), a7), ((i3, a3), a’), oa 
E Statesg x ((Actionsg x Statesg)* U (Actionsg x Statesg)”), 
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consisting of the initial state and a finite or infinite sequence of action and state 
pairs, such that Probg(a’,(i;,a;))(a*1) > 0 for any 0 < j < |r|, where |x| is 
the number of actions taken during path r. (|t| = œœ if the path is infinite.) For 
a path m, we denote by m4(;) = (ij,aj) the j-th action taken along path 7, by 
Ts(j)(= af) the j-th state visited, where 75(9)(= @°) is the initial state, and by 
m(j)(= a, ((41, a1), a"),..., ((i;,@;), a7)) the first j action-state pairs of 7. 

We call a path of infinite (finite) length a run (finite path). We write Runsg 
(F'Pathg) for the sets of all runs (finite paths) and Runsg a (FPathg a) for the 
sets of all runs (finite paths) that start at a given initial state a € Statesg, i.e., 
paths 7 with Ts(o) = a. We write last(7) for the last state of a finite path 7. 

A strategy in BMDP B is a function o : FPathg — D(Actionsg) such that, 
for all 7 € FPathg, supp(o(m)) C Actionsg(last(7)). We write Xg for the set 
of all strategies. A strategy is called static, if it always applies an action to the 
first entity in any state and for all entities of the same type in any state it 
picks the same action. A static strategy 7 is essentially a function of the form 
o : P — A, i.e., for an arbitrary 7 € FPathg, we have T(t) = (1, 0(last(7)1)) 
whenever last(m) # €. 

A strategy o € Xg and an initial state a induce a probability measure over 
the set of runs of BMDP B in the following way: the basic open sets of Runsg 
are of the form r- (Actionsg x Statesg)”, where n € FPathg, and the measure of 
this open set is equal to (Oa ! a(n (i))(TA+1)) : Probg(Tsti), Tati) ) (T5441) 
if Ts(0) = a and equal to 0 otherwise. It is a classical result of measure theory 
that this extends to a unique measure over all Borel subsets of Runsg and we 
will denote this measure by Pg a 


Let f : Runsg > Ry be a function measurable with respect to Pg œ. The 
expected value of f under strategy o when starting at a is defined as EZ a {f} = 
i Funa d dP% œa (which can be oo even if the probability that the value of f 
is infinite is 0). The infimum expected value of f in B when starting at a is 
defined as V.(a)(f) = infoer, EB a {f}. A strategy, o, is said to be optimal 
if EZ. {f} = V.(a)(X) and e-optimal if EZ , {f} < V.(a)(f) + e. Note that 
€-optimal strategies always exists by definition. We omit the subscript B, e.g., 
in Statesg, Xg, etc., when the intended BMDP is clear from the context. 

For a given BMDP B and N > 0 we define Totaly (r), the cumulative cost 
of a run ~ after N steps, as Totaly(7) = pales Cost(™5(;),Ta(i+1))- For a 
configuration a € States and a strategy o € X, let ETotaly(B,a,o) be the 
N-step expected total cost defined as ETotaly(B,a,c) = ig, ,{ Totaly } and 
the expected total cost be ETotal,(B,a,c) = limy_... ETotaly (B, a,c). This 
last value can potentially be oo. For each starting state a, we compute the 
optimal expected cost over all strategies of a BMDP starting at a, denoted by 
ETotal,.(B, a), i.e 


ETotal,(6,a) = inf ETotal(6, a,c). 


o€ Lp 
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As we are going to prove in Theorem 4.b that, for any a € States, we have 


lal 
ETotal, (B, a) = 5 ETotal,.(B, a;). 


i=l 


This justifies focusing on this value for initial states that consist of a single entity 
only, as we will do in the following section. 


3 Fixed Point Equations 


Following [8], we define here a linear equation system with a minimum operator 
whose Least Fixed Point solution yields the desired optimal values for each type 
of a BMDP with non-negative costs. This system generalises the Bellman’s 
equations for finite-state MDPs. We use a variable x, for each unknown 
ETotal.(6,q) where q € P. Let z be the vector of all x4, whereq € P. The 
system has one equation of the form xq = F,() for each type q € P, defined as 


tq = min (elga) + > p(a.a)(a) > tas) - (a) 


A 
ac A(q) aeP* i<|a| 


We denote the system in vector form by = F(Z). Given a BMDP, we can 
easily construct its associated system in linear time. Let c* € R&, denote the 


n-dimensional vector of ETotal.(B,q)’s where n = |P|. Let us define 2° = 0, 
ght = Fk+1(0) = F(e*), for k > 0. 


Theorem 4. The following hold: 


(a) The map F : R&q = Rg, is monotone and continuous (and so 0 < z: < 
z*+1 for all k > 0). 

(b) & = F(@). 

(c) For all k > 0, 7! < œ. 

(d) For all @ € Re; if € = F(Z), thene <7. 


(e) & = imko Z}. 
Proof. 


(a) All equations in the system F(x) are minimum of linear functions with non- 
negative coefficients and constants, and hence monotonicity and continuity 
are preserved. 

(b) It suffices to show that once action a is taken when starting with a single 
entity q and, as a result, q is replaced by a with probability p(q, a)(a), then 
the expected total cost is equal to: 


c(q,a) + X` ETotal,(B,a;) . (de) 


iS|a| 
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This is because then the expected total cost of picking action a when at 
q is just a weighted sum of these expressions with weights p(q,a)(a) for 
offspring a. And finally, to optimise the cost, one would pick an action a 
with the smallest such expected total cost showing that 


ETotal,(6,q) = min (c(a,a) + 5 plq, a)(a) 5 ETotal, (B, a) 


A 
aCA(a) acP* i<lal 


indeed holds. 

Now, to show (#), consider an -optimal strategy o; for a BMDP that starts 
at a;. It can easily be composed into a strategy o that starts at a just by 
executing cı first until all descendants of a; die out, before moving on to 
02, etc. If one of these strategies, o;, never stops executing then, due to the 
assumption that all costs are positive, the expected total cost when starting 
with a; has to be infinite and so has to be the overall cost when starting 
with a (as all descendants of a; have to die out before the overall process 
terminates), so (#) holds. This shows that c(q,a) + ));<),, ETotal.(B, £a, ) 
can be achieved when starting at a. At the same time, we cannot do better 
because that would imply the existence of a strategy o’ for one of the entities 
gj with a better cost than its optimal cost ETotal, (5, œj). 

Since Z° = 0 < & and due to (b), it follows by repeated application of F to 
both sides of this inequality that z* < F(é) =, for all k > 0. 

Consider any fixed point @ of the equation system F(z). We will prove that 
c* < Z. Let us denote by o’ a static strategy that picks for each type an 
action with the minimum value of operator F in Z’, i.e., for each entity 


q we choose o’(q) = arg minac A(q) (cla, a) + Xoc p PUG, @)(@) Vijay a.) 
where we break ties lexicographically. 

We now claim that, for all k > 0, ETotal,(B,q,0’) < ¢, holds. For k = 0, 
this is trivial as ETotal,(B,¢,0’) = 0 < ¢,. For k > 0, we have that 


ETotal;(B, q, 0’) 2 c(q,o'(q))+ 5 plq, o'(q))(a) 5 ETotal,_1(B, a;, 0’) 


ae P* i<|a| 


where (1) follows from the fact that after taking action o’(q) first, there 
are only k — 1 steps left of the BMDP B that would need to be distributed 
among the offspring a of q somehow. Allowing for k—1 steps for each of the 
entities a; is clearly an overestimate of the actual cost. (2) follows from the 
inductive assumption. (3) follows from the definition of o’. The last equality, 
(4), follows from the fact that Z’ is a fixed point of F. 

Finally, for every q € P, from the definition we have & = ETotal.(B,q) < 
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ETotal,.(B,¢q,o’) = limp—oo ETotal,(B,g,o’) and each element of the last 
sequence was just shown to be < ¢. 

(e) We know that Z* = limy_... Z" exists in RZ, because it is a monotonically 
non-decreasing sequence (note that some entries may be infinite). In fact 
we have Z* = limp... F*+1(0) = F(limx—o F*(0)), and thus Z* is a fixed 
point of F. So from (d) we have c* < z%*. At the same time, due to (c), we 
have z! < @ for all k > 0, so Z* = limp... Z? < @ and thus limk oo Z? = 


ak 


C. 


The following is a simple corollary of Theorem 4. 
Corollary 5. In BMDPs, there exists an optimal static control strategy o*. 


Proof. It is enough to pick as o*, the strategy o’ from Theorem 4.d, for @ = &. 
We showed there that for all k > 0 and q € P we have ETotal,(B,¢,0*) < @. 
So ETotal.(B,q,0*) = limg—oo ETotal,(B,g,0*) < Œ = ETotal,(B,q), so in 
fact ETotal, (B, q,o*) = ETotal,.(B,q) has to hold as clearly ETotal,(B,q,0*) > 
ETotal..(B, q). 


Note that for a BMDPs with a fixed static strategy o (or equivalently BMCs), 
we have that F(Z) = Bog + ĉo, for some non-negative matrix Bọ € R35”, and 
a positive vector G, > 0 consisting of all one step costs c(q,o(q)). We will refer 
to F as F, in such a case and exploit this fact later in various proofs. 

We now show that c® is in fact essentially a unique fixed point of F. 
Theorem 6. If F(Z) = and Zq < œ for some q € P then tq = G. 

Proof. By Corollary 5, there exists an optimal static strategy, denoted by o*, 
which yields the finite optimal reward vector Cc. 

We clearly have that z = F(Z) < F,«(Z), because o* is just one possible pick 
of actions for each type rather than the minimal one as in (@). Furthermore, 


For (T) = BorT + do« 
< Bo (BoT FF box) F bo* 
= Boz + (Bo + 1)b% 


<... < lim Be ( Bh) a 
SeS ip Beet ND Bajt 


k=0 
Note that @ = (S772 BE. )bo», because 
k $ 
c= lim F*(0) = jm F*, (0) = jm 3 Bk 


Due to Theorem4.d, we know that & < gą < ov, so all entries in the q-th 
row of BE. have to converge to 0 as k — oo, because otherwise the q-th row 
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of Fri B£. would have at least one infinite value and, as a result, the q-th 
position of &* = (Xo B¥.)bs« would also be infinite as all entries of bọ» are 
positive. Therefore, lim,_,..(B*.2), = 0 and so 


k— oo 


Zq < (lim BE. 2), +(X BE. )bor)q =e. 
k=0 


The proof is now complete. 


4 Q-learning 


We next discuss the applicability of Q-learning to the computation of the fixed 
point defined in the previous section. 

Q-learning [30] is a well-studied model-free RL approach to compute an 
optimal strategy for discounted rewards. Q-learning computes so-called Q-values 
for every state-action pair. Intuitively, once Q-learning has converged to the fixed 
point, Q(s,a) is the optimal reward the agent can get while performing action a 
after starting at s. The Q-values can be initialised arbitrarily, but ideally they 
should be close to the actual values. Q-learning learns over a number of episodes, 
each consisting of a sequence of actions with bounded length. An episode can 
terminate early if a sink-state or another non-productive state is reached. Each 
episode starts at the designated initial state so. The Q-learning process moves 
from state to state of the MDP using one of its available actions and accumulates 
rewards along the way. Suppose that in the i-th step, the process has reached 
state s;. It then either performs the currently (believed to be) optimal action 
(so-called exploitation option) or, with probability €, picks uniformly at random 
one of the actions available at s; (so-called exploration option). Either way, if 
Gi, ri, and s;4, are the action picked, reward observed and the state the process 
moved to, respectively, then the Q-value is updated as follows: 


Qi+1(i, ai) = (1 = As) Qilsi, ai) + Ai(ri + 7 max Qi(Si+1,4)) , 


where A; €]0, 1[ is the learning rate and y € ]0, 1] is the discount factor. Note the 
model-freeness: this update does not depend on the set of transitions nor their 
probabilities. For all other pairs s,a we have Q;+1(s,a) = Q;(s, a), i.e., they are 
left unchanged. Watkins and Dayan showed the convergence of Q-learning [30]. 


Theorem 7 (Convergence [30]). Fory < 1, bounded rewards r; and learning 
rates 0 < A; < 1 satisfying: 


Co Co 
pe =0o and” X; < œ, 
i=0 i=0 


we have that Q;(s,a) > Q(s,a) as i — œ for all s,a € SX A almost surely if all 
(s,a) pairs are visited infinitely often. 
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However, in the total reward setting that corresponds to Q-learning with 
discount factor y = 1, Q-learning may not converge, or converge to incorrect 
values. However, it is guaranteed to work for finite-state MDPs in the setting of 
undiscounted total reward with a target sink-state under the assumption that 
all strategies reach that sink-state almost surely. The assumption that we make 
instead is that every transition of BMDP incurs a positive cost. This guarantees 
that a process that does not terminate almost surely generates an expected 
infinite reward in which case the Q-learning will coverage (or rather diverge) to 
oo, so our results generalise these existing results for Q-learning. 

We adopt the Q-learning algorithm to minimise cost as follows. Each episode 
starts at the designated initial state qo € P. The Q-learning process moves from 
state to state of the BMDP using one of its available actions and accumulates 
costs along the way. Suppose that, in the i-th step, the process has reached state 
a. It then selects uniformly at random one of the entities of a, e.g., the j-th 
one, a; and either performs the currently (believed to be) optimal action or, 
with probability €, picks an action uniformly at random among all the actions 
available for œj. If c and 8 denote the observed cost and entities spawned by this 
action, respectively, then the Q-value of the pair œj, a; are updated as follows: 


[BI 
Qiri (aj, ai) = (1 — A)Qi (a5, ai) + ri(e+ 5 min Pili, a)). 


icl ac A(ßi 


and all other Q-values are left unchanged. In the next section we show that Q- 
learning almost surely converges (diverges) to the optimal finite (respectively, 
infinite) value of č* almost surely under rather mild conditions. 


5 Convergence of Q-Learning for BMDPs 


We show almost sure convergence of the Q-learning to the optimal values c* in 
a number of stages. We first focus on the case when all optimal values in c are 
finite. In such a case, we show a weak convergence of the expected optimal values 
for BMCs to the unique fixed-point c*, as defined in Sect.3. To establish this, 
we show that the expected Q-values are monotonically decreasing (increasing) if 
we start with Q-values Kc* for k > 1 (« < 1). This convergence from above and 
below gives us convergence in expectation using the squeeze theorem. 

We then establish almost sure convergence to ¢* by proving a contraction 
argument, with the extra assumption that the selection of the Q-value to update 
is done independently at random in each step. 

In the next step, we extend this result to BMDPs, first establishing that 
Q-learning will almost surely converge to the region of the Q-values less than or 
equal to c*. We then show that, when considering the pointwise limes inferior 
values of the sequences of Q-values, there is no point in that region such that 
every €-ball around it has a non-zero probability to be represented in the limes 
inferior. This establishes that c* is the fixed point the Q-values converge against. 


Reinforcement Learning for Branching Markov Decision Processes 663 


Only at the very end, we show that Q-learning also converges (or rather 
diverges) to the optimal value even if that value happens to be infinite. We then 
turn to a type with non-finite optimal value and provide an argument for the 
divergence to oo of its corresponding Q-value. 

We assume that all the Q-values are stored in a vector Q of size (|P|- |A|). 
We also use Q(q,a) to refer to the entry for type q € P and action a € A(q). We 
introduce the target for Q operator, T, that maps a Q-values vector Q to: 


T(Q)(q,4) = c(g,a) + D> p( Do, min Q(ai,a%) - 


acQ* 


We call T the ‘target’, because, when the Q(q,a) value is updated, then 


1(Qi+1(q, @)) = (1 — à:)Q: (q,a) + AT (Qi) (q, a) 


holds, whereas otherwise Qi+1(q, a) = Qi(q, a). 
Thus, when Q(q, a) is selected for update with a chance of pga, we have that 


1(Qi+1(4, @)) = (1 = AiPa,a)Q:(4, a) + AiPa,a T (Qi) (q,a) - (9) 


5.1 Convergence for BMCs with Finite c* 


Since BMCs have only one action, we omit mentioning it for ease of notation. 
Note that for BMCs, the target for the Q-values is a simple affine function: 


lal 


T(Q)(q) = cla) + $ pala) > / Qos). 


ae P* 


And it coincides with operator F as defined in Sect.3. Therefore, due to 
Theorem 6, T(Q) has a unique fixed point which is c*. Moreover, T(Q) = BQ+4, 
where B is a non-negative matrix and € is a vector of one step costs c(q), which 
are all positive. 

Naturally, applying T to a non-negative vector Q or multiplying it by B are 
monotone: Q > Q’ > T(Q) > T(Q’) and BQ > BQ’. Also, due to the linearity 
of T, E(T(Q)) = T(E(Q)) holds, where Q is a random vector. 

We now start with a lemma describing the behaviour of Q-learning for initial 
Q-values when they happen to be equal to «c* for some « > 1. 


Lemma 8. Let Qo = Kc for a scalar factor k > 1. Then the following holds 
for alli €N, 


e&* < T(E(Q;)) < E(Qis1) < E(Qi), 


assuming that Q-value to be updated in each step is selected independently at 
random. 
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Proof. We show this by induction. For the induction basis (i = 0), we have that 
ce < Qo by definition. 

As C is the fixed-point of T, we have T(c*) = č*, and the monotonicity of T 
provides T(é*) < T(Qo). At the same time 


T(Qo) = T(K) = BKE +2 
= K(BC* + @) — Ke +E 
= Ke" — (k — 1)c 
= Qo — (x — 1) < Qo. 
This provides c < T(E(Qo)) < E(Qo). Finally, T(E(Qo)) < E(Qo) entails 


for a learning rate Ap € [0,1] that T(E(Qo)) < E(Qi) < E(Qo) due to (Q). 
For the induction step («++ i+ 1), we use the induction hypothesis 


e < T(E(Q:)) < E(Qis1) < E(Qi). 
The monotonicity of T and ce < E(Qi41) < E(Q;) imply that T(c*) < 


T(E(Qi+1)) < T(E(Q;)) holds. With T(E) = ¢ (from the fixed point equations) 
and the induction hypothesis, ¢* < T(E(Qi+1)) < E(Qi+1) follows. 

Using T(Œ(Qi+1)) = E(T(Qi+1)), this provides E(T(Qi+1)) < E(Qi+1), 
which implies with \;41 € [0, 1] that 


T(E(Qi+1)) = E(T(Qi+1)) < E(Qi+2) < E(Qi+1) 
holds, completing the induction step. 


By simply replacing all < with > in the above proof, we can get the following 
for all initial Q-values that happen to be «c* where «K < 1: 


Lemma 9. Let Qo = «Cc for a scalar factor k € [0,1]. Then the following 
holds for alli € N, assuming that the Q-value to update in each step is selected 
independently at random: &* > T(E(Q;)) > E(Qi+1) > E(Q;). 


We now first establish that the distance between Q and c can be upper 
bounded by the distance between Q and T(Q) with a fixed linear factor u > 0. 


Lemma 10. There exists a constant u > 0 such that 
SOQ- TQ) uA -EA 
qEP qEP 
when Qo = KŒ. 
Proof. We show this for x > 1. The proof for « < 1 is similar, and there is 
nothing to show for «& = 1. 


We first consider the linear programme with a variable for each type with 
the following constraints for some fixed 6 > 0: 


Q >a", T(Q) < Q,and l) = Do e(q) +6. 


qEP qEP 


Reinforcement Learning for Branching Markov Decision Processes 665 


An example solution to this constraint system is Q = (1+ x 2 FG) le: 
qe 
We then find a solution minimising the objective ` ep |(Q—T(Q)(p)|, noting 
that all entries are non-negative due to the first constraint. This is expressed by 
adding 2|P| constraints 


z4 = Qla) — T(Q)(q) 
za = T(Q)(q) - Qla) 


and minimising ) gep Tq- 
As ¢* is the only fixed-point of T, and X gep Q(4) = X gep © (q) + 6 implies 
that, for an optimal solution Q*, Q* Æ č*, we have that 


XO Q* — T(Q"*)(q)| > 0. 


qEP 


Due to the constraint Q > @, we always have Q = @ + Qa for some Qa > 0. 
We can now re-formulate this linear programme to look for Q4 instead of Q: 


Qa 20, 
BQa < Qa, and 


X Qala) = 64, 


qEP 


with the objective to minimise }) ep |(Qa — BQa)(q)|- 

The optimal solution @*, to this linear programme gives an optimal value 
Q* = &%+Q% for the former and, vice versa, the value Q* for the former provides 
an optimal solution Q% —c* for the latter, and these two solutions have the same 
value in their respective objective function. 

Thus, while the former constraint system is convenient to show that the value 
of the objective function is positive, the latter constraint system is, except for 
Žep Qala) = ô, linear. This means that any optimal solution for ô = 6; can 
be obtained from the optimal solution for 6 = d2 just by rescaling it by 6; /d9. It 
follows that the optimal value of the objective function is linear in ô, e.g., there 
exists u > 0 such that its value is uô. 


We now show that the sequence of Q-values updates converges in expectation 
to œ when Qo = Kœ. 


Lemma 11. Let Qo = Kc* where k > 0. Then, assuming that each type-action 
pair is selected for update with a minimal probability pmin in each step, and that 
pee Aj = œ, then lim;_,.. E(Q;) = & holds. 


Proof. We proof this for k > 1. A similar proof shows this for any « € [0,1]. 
Lemma 8 provides that all E(Q;) satisfy the constraints E(Q;) > œ and 
T(E(Qi)) < E(Qi). 
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Let Pmin be the smallest probability any Q-value is selected with in each 
update step. Due to Lemma 10, there is a fixed constant u > 0 such that 


NIA) — TQ) > uX Qila) -— 2 (9) - 

qEP qEP 
By taking the expected value of both sides and the fact that c* < T(E(Q;)) < 
“(Qi+i1) < E(Q;) due to Lemma8, we get 


XC E(Q:)(a) — T(E(Q:))(@) = u >> El) (a) — c (a), 


qEP qEP 


then due to (Q) we have 


NC E(Qi) (4) — E(Qi41)(@) = UPminài X` E(Qi)(q) — (4), 


qEP qEP 


and finally just by rearranging these terms we get 


5 0(Qi+1)(g) — E (q) < (1 = HPminài) 5 (Q:)(a) — (a) - 


qEP qEP 


Note that all summands are positive by Lemma 8. 
With Jo A; = 00, we get that X>; o upminAi = 00, because pmin and u 
are fixed positive values. This implies that Io — HUPminài) = 0 and so the 
distance between E(Q;) and Z* converges to 0. 


Lemma 11 suffices to show convergence of Q-values in expectation. 

Theorem 12. When each Q-value is selected for an update with a minimal 
probability Pmin in each step, and a Aj = œ, then lim;_... E(Q;) = & holds 
for every starting Q-values Qo > 0. 


Proof. We first note that none of the entries of c* can be 0. This implies that 
there is a scalar factor xk > 0 such that 0 < Qo < K@*. As the Q; are monotone 
in the entries of Qo, and as the property holds for Qj = 0 = 0-é and Qj = Ke 
by Lemma 11, the squeeze theorem implies that it also holds for Qo. 


Convergence of the expected value is a weaker property than expected 
convergence, which also explains why our assumptions are weaker than in 
Theorem 7. With the common assumption of sufficiently fast falling learning 
rates, Sa di” < 00, we will now argue that the pointwise limes inferior of the 
sequence of Q-values almost surely converges to c*. This will later allow us to 
infer convergence of the actual sequence of Q-values to c*. 


Theorem 13. When each Q-value is selected for update with a minimal 
probability Pmin in each step, 


co co 
wer = 00 and A? < oO, 
i=0 i=0 


then lim; Qi = @ holds almost surely for every starting Q-values Qo > 0. 
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Proof. We assume for contradiction that, for some Q Æ c, there is a non-zero 
chance of a sequence {Q;}ien, such that 


- ||Q —liminfj_.. Qilloo < e’ for all e’ > 0, and 
— there is a type q such that Q(q) < T(Q)(q). 


Then there must be an € > 0 such that Q(q) + 3e < T(Q—2e-1)(q). We fix such 
ane > 0. 

Now we have the assumption that the probability of Iĝ —liminfn—=æ Qilloo < 
€ is positive. Then, in particular, the chance that, at the same time, 
lim inf;_..0 Q; > Q — €- I and liminf;_.~ Qi < Q + €- I, is positive. 

Thus, there is a positive chance that the following holds: there exists an ne 
such that, for all i > ne, Q; > Q —2e-1. This implies 


T(Qi)(q) > T(Q — 22 - 1) (4) > Q(q) + 3e. 


Thus, the expected limit value of Q;(q) is at least Q(q) + 3¢, for every tail of 
the update sequence. Now, we can use Q- 22 as a bound on the estimation of the 
updates in Q-learning as Q; > Q —2e-1 holds. At the same time, the variation 
of the sum of the updates goes to 0 when J` i = 0%? is bounded. Therefore, it 
cannot be that lim inf;... Q; < Q + €- I holds; a contradiction. 

We note that if, for a Q-values Q > 0, there is a q € P with Q(q’) < (qd), 
then there is a q € P with Q(q) < T(Q)(q) and Q(q) < č (q). This is because, 


for the Q-values Q’ with Q’(q) = min{Q(q), &(q)} for all q € Q, Q’ < ce. Thus, 


there must be a type q € P such that « = ane < 1 is minimal, and Q’ > kœ. 


As we have shown before, T(K¢*) = Kē“ — (K — 1), such that the following holds: 


T(Q)(q) > T(Q’)(q) > T(Ke")(q) = Ke" (q) + (1 — K)elg) > č (a) = QC). 


Thus, we have that liminfj..Q; > c holds almost surely. With 


5.2 Convergence for BMDPs and Finite c* 


We start with showing that, for BMDPs, the pointwise limes superior of each 
sequence is almost surely less than or equal to c*. We then proceed to show that 
the limes inferior of a sequence is almost surely c*, which together implies almost 
sure convergence. 


Lemma 14. When each Q-value of BMDP is selected for update with a 
minimal probability Pmin in each step, yi = œ, Dro? < oo, then 
limsup,;_,,, Qi < © holds almost surely for every starting Q-values Qo È 0. 


Proof. To show the property for the limes superior, we fix an optimal static 
strategy o* that exists due to Corollary 5. 

We define an BMC obtained by replacing each type q in the BMDP with 
A(q) = {a1,...,@x}, by k types (q,a1),...,(q,a%) with one action, where each 
type q’ is replaced by the type-action pair (q',o*(q')). 
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It is easy to see that a type (q,o*(q)) for the resulting BMC has the same 
value as the type q and the type-action pair (q,o*(q)) in the BMDP that we 
started with. 

When identifying these corresponding type-action pairs, we can look at the 
same sampling for the BMDP and the BMC, leading to sequences Qo, Q1, Qa,... 
and Qo, Q4,Q5,..., respectively, where Qo = Q9. 

It is easy to see by induction that Q; < Qj. Considering that {Q/}ien almost 
surely converges to c* by Theorem 13, we obtain our result. 


Theorem 15. When each Q-value of an BMDP is selected for update ae a 
minimal probability Pmin; Xpo ài = ©, Dope A? < 00, then lim; Qj = 
holds almost surely for every starting Q-values Qo > 0. 


Proof. As a first simple corollary from Lemma 14, we get the same result for the 
limes inferior (as lim inf < lim sup must hold). 

We now assume for contradiction that, for some vector Q < c, there isa 
non-zero chance of a sequence {Q;}ien such that ||Q — liminf,. Qi|loo < € 
for all e’ > 0. 

As Q is below the fixed point of T, there must be one type-action pair 
(q,0*(q)) such that Q(qg, 0*(¢)) < T(Q)(q,0*(q)) (cf. the proof of Theorem 13). 
Moreover, there must be an € > 0 such that 


Q(q,0*(q)) + 3e < T(Q + 2€ - 1)(g,0°*(q)). 


We fix such an € > 0. pe 

Now we assume that the probability of |Q — lim infn—>ə Qilloo < € is positive. 
Then the chance that, simultaneously, lim inf;_,.. Q:(q, o* (q)) > O(a, o*(q))— € 
and lim infi—oo Qi(q,0*(q)) < Q(q,0*(q)) + e, is positive. 

Thus, there is a positive chance that the or holds: there exists an ne 
such that, for all i > ns we have Q; > Q- 2e - 1. This entails 


T(Qi)(q,0*(@)) > T(Q — 22 - 1) (q, 0* (a)) > Qla, o* (q)) + 3e. 


Thus, the expected limit value of Q;(q, o*(a)) is at least Q(q,o*(a)) + 3e, for 
every tail of the update sequence. Now, we can use T(Q — 2e - 1)(q,0*(a)) as 
a bound on the estimation of T(Q)(q,0*(q)) during the update of the Q-value 
of the type-action pair (qg,0*(q)). At the same time, the variation of the sum of 
the updates goes to 0 when }>7°, 4? is bounded. Therefore, it cannot be that 


lim infj Qi(q,o*(a)) < O(a, o*(a)) + £ holds; a contradiction. 


5.3 Divergence 


We now show divergence of Q(q) to co when at least one of the entries of é(q) 
is infinite. First due to Theorem 6 and its proof we have that c* = } <o B’ for 
some non-negative B and positive č. Therefore c* is monotonic in B for BMCs. 
Likewise, the value of c* for a BMDP depends only on the cost function and the 
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expected number of successors of each type spawned: Two BMDPs with same 
cost functions and the expected numbers of successors have the same fixed point 
c*. Thus, if a type q with one action spawns either exactly one q or exactly 
one q’ with a chance of 50% each, or if it spawns 10 successors of type q and 
another 10 or type q’ with a chance of 5%, while dying without offspring with 
a chance of 95%, both lead to identical matrices B and so the same é (though 
this difference may impact the performance of Q-learning). 

Naturally, raising the number of expected number of successors of any type 
for any type-action pair strictly raises c*, while lowering it reduces c*, and for 
every set of expected numbers, the value of c is either finite or infinite. 

Let us consider a set of parameters at the fringe of finite vs. infinite c*, and 
let us choose them pointwise not larger than the parameters from the BMC or 
BMDP under consideration. As the fixed point from Sect.3 is clearly growing 
continuously in the parameter values, this set of expected successors leads to a 
c* which is not finite. 

We now look at the family of parameter values that lead to a € [0,1 times the 
expected successors from our chosen parameter at the fringe between finite and 
infinite values, and refer to it as the a- BMDP. Let also ¢ denote the fixed point 
for the reduced parameters. As the solution to the fixed point grows continuously, 
so does c. Moreover, if €] = lima.1 Œ was finite, then c* would be finite as 
well, because then cj = œ. 

Clearly, for all parameters a € [0, 1|, the Q-values of an a-BMC or a-BMDP 
converge against ¢*. Thus, the Q-values for the BMC or BMDP we have started 
with converges against a value, which is at least supyejo1;- AS this is not a 
finite value, Q-learning diverges to co. 


6 Experimental Results 


We implemented the algorithm described in the previous section in the formal 
reinforcement learning tool MUNGOJERRIE [21], a C++-based tool which 
reads BMDPs described in an extension of the PRISM language [18]. The tool 
provides an interface for RL algorithms akin to that of [3] and invokes a linear 
programming tool (GLOP) [22] to compute the optimal expected total cost based 
on the optimality equations (@). 


6.1 Benchmark Suite 


The BMDPs on which we tested Q-learning are listed in Table 1. For each model, 
the numbers of types in the BMDP, are given. Table 1 also shows the total cost 
(as computed by the LP solver), which has full access to the BMDP. This is 
followed by the estimate of the total cost computed by Q-learning and the time 
taken by learning. The learner has several hyperparameters: e is the exploration 
rate, a is the learning rate, and tol is the tolerance for Q-values to be considered 
different when selecting an optimal strategy. Finally, ep-l is the maximum episode 
length and ep-n is the number of episodes. The last two columns of Table 1 


670 E. M. Hahn et al. 


report the values of ep-l and ep-n when they deviate from the default values. All 
performance data are the averages of three trials with Q-learning. Since costs 
are undiscounted, the value of a state-action pair computed by Q-learning is a 
direct estimate of the optimal total cost from that state when taking that action. 


Table 1. Q-learning results. The default values of the learner hyperparameters are: 
c€ = 0.1, a = 0.1, tol= 0.01, ep-l= 30, and ep-n= 20000. Times are in seconds. 


Name Types | Optimal cost | Estimated cost Time (avg.) | ep-l | ep-n 
cloud1 3 5 5.026 0.369 

cloud2 4 5 5.016 0.369 

bacterial 3 2.5 2.514 0.374 

bacteria2 3 1.34831 1.413 0.387 

protein 3 6 5.067 0.372 

frozenSmall | 16 1.84615 1.740 2.834 100 
rand68 10 150.432 154.400 0.402 

rand283 9 4 4 0.075 1000 
rand945 19 212 208.177 10.756 200 | 40000 
rand3242 43 4 4.372 5.960 100 
rand6417 62 10 10 12.498 50 


Models cloudi and cloud2 are based on the motivating example given in 
the introduction. Examples bacterial and bacteria2 model the population 
dynamics of a family of two bacteria [28] subject to two treatments. The objective 
is to determine which treatment results in the minimum expected cost to 
extinction of the bacteria population. The protein example models a stochastic 
Petri net description [19] corresponding to a protein synthesis example with 
entities corresponding to active and inactive genes and proteins. The example 
frozenSmall1 [3] is similar to classical frozen lake example, except that one of 
the holes result in branching the process in two entities. Entities that fall in 
the target cell become extinct. The objective is to determine a strategy that 
results in a minimum number of steps before extinction. Finally, the remaining 
5 examples are randomly created BMDP instances. 


7 Conclusion 


We study the total reward optimisation problem for branching decision processes 
with unknown probability distributions, and give the first reinforcement learning 
algorithm to compute an optimal policy. Extending Q-learning is hard, even 
for branching processes, because they lack a central property of the standard 
convergence proof: as the value range of the Q-table is not a priori bounded 
for a given starting table Qo, the variation of the disturbance is not bounded. 


Reinforcement Learning for Branching Markov Decision Processes 671 


This looks like a more substantial obstacle than the one Q-learning faces 
when maximising undiscounted rewards for finite-state MDPs, and it is well 
known that this defeats Q-learning. So it is quite surprising that we could 
not only show that Q-learning works for branching processes, but extend these 
results to branching decision processes, too. Finally, in the previous section, we 
have demonstrated that our Q-learning algorithm works well on examples of 
reasonable size even with default hyperparameters, so it is ready to be applied 
in practice without the need for excessive hyperparameter tuning. 
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Abstract. We present Cameleer, an automated deductive verification 
tool for OCaml. We leverage on the recently proposed GOSPEL (Generic 
OCaml SPEcification Language) to attach rigorous, yet readable, behav- 
ioral specification to OCaml code. The formally-specified program is fed 
to our toolchain, which translates it into an equivalent one in WhyML, 
the programming and specification language of the Why3 verification 
framework. We report on successful case studies conducted in Cameleer. 
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1 Introduction 


Over the past decades, we have witnessed a tremendous development in the field 
of deductive software verification [11], the practice of turning the correctness of 
code into a mathematical statement and then prove it. Interactive proof assis- 
tants have evolved from obscure and mysterious tools into de facto standards 
for proving industrial-size projects. On the other end of the spectrum, the so- 
called SMT revolution and the development of reusable intermediate verification 
infrastructures contributed decisively to the development of practical automated 
deductive verifiers. 

Despite all the advances in deductive verification and proof automation, lit- 
tle attention has been given to the family of functional languages [27]. Let us 
consider, for instance, the OCaml language. It is well suited for verification, given 
its well-defined semantics, clear syntax, and state-of-the-art type system. Yet, 
the community still lacks an easy to use framework for the specification and 
verification of OCaml code. The working programmers must either re-implement 
their code in a proof-aware language (and then rely on code extraction), or they 
must turn themselves into interactive frameworks. Cameleer fills the gap, being 
a tool for the deductive verification of programs written in OCaml, with a clear 
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focus on proof automation. Cameleer uses the recently proposed GOSPEL [5], a 
specification language for OCaml. We advocate here the vision of the specifying 
programmer: the person who writes the code should also be able to naturally pro- 
vide suitable specification. GOSPEL terms are written in a subset of the OCaml 
language, which makes them more appealing to the regular programmer. More- 
over, we believe specification and implementation should co-exist and evolve 
together, which is exactly the approach followed in Cameleer. 

Cameleer takes as input a GOSPEL-annotated OCaml program and translates 
it into an equivalent counterpart in WhyML, the programming and specifica- 
tion language of the Why3 framework [16]. Why3 is a toolset for the deductive 
verification of software, clearly oriented towards automated proof. A distinctive 
feature of Why3 is that it interfaces with several different off-the-shelf theorem 
provers, namely SMT solvers. 


Contributions. To the best of our knowledge, Cameleer is the first deductive 
verification tool for annotated OCaml programs. It handles a realistic subset of 
the language, and its interaction with the Why3 verification framework greatly 
increases proof automation. Our set of case studies successfully verified with the 
Cameleer tool constitutes, by itself, an important contribution towards building 
a comprehensive body of verified OCaml codebases. Finally, it is worth noting 
that the original presentation of GOSPEL was limited to the specification of 
interface files. In the scope of this work, we have extended it to include imple- 
mentation primitives, such as loop invariants and ghost code (i.e., code that has 
no computational purpose and is used only to ease specification and proof effort) 
evolving GOSPEL from an interface specification language into a more mature 
proof language. 


2 Illustrative Example — Binary Search 


Higher-Order Implementation. Fig. 1 presents an implementation of binary 
search, where the comparison function, cmp, is given as an argument to the 
main function. For the sake of readability, we give the type of arguments and 
return value of function binary_search, but these can be inferred by the OCaml 
compiler. 

The function contract is given after its definition as a GOSPEL annotation, 
written within comments of the form (*@ ... *). The first line names the 
returned value. Next, the first precondition establishes that the cmp is a total 
pre order following the OCaml convention: if x is smaller than y, then cmp x y 
< 0; if x is greater than y, then cmp x y > 0; finally, cmp x y = Oif x andy 
are equal values?. It is worth noting that GOSPEL, hence Cameleer, assumes cmp 
to be a pure function (i.e., a function without any form of side-effects). The 
second precondition requires the array to be sorted according to the cmp rela- 
tion. Finally, the last two clauses capture the possible outcomes of execution: the 
regular postcondition (ensures clause) states the returned index is within the 
bounds of a and its value is equal to v; the exceptional postcondition (raises) 


1 For the sake of space, we omit the definition of predicate is_total_pre_order. 
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let binary_search (cmp: ’a -> ’a -> int) (a: ’a array) (v: ’a) : int = 
let 1 = ref O in 
let u = ref (length a - 1) in 
let exception Found of int in 
try while !1 <= !u do 
(*@ variant tu - !1 *) 
(*@ invariant 0 <= !1 && !u < length a *) 
(*@ invariant forall i. 0 <= i < length a -> cmp a.(i) v = 0 -> 
!1 <= i <= !u *) 
let m = !1 + (!u - !1) / 2 in 
let c = cmp a.(m) v in 
if c < 0 then 1 := m+ 1 


else if c > O then u :=m- 1 
else raise (Found m) 
done; 


raise Not_found 
with Found i -> i 
(*@ i = binary_search cmp a v 
requires is_total_pre_order cmp 
requires forall i j. 0 <= i <= j < length a -> cmp a.(i) a.(j) <= 0 
ensures O <= i < length a && compare a.(i) v = 0 
raises Not_found -> forall i. 0 <= i < length a -> cmp a.(i) v <> 0 *) 


Fig. 1. Binary search implemented as a functor. 


states that whenever exception Not_found is raised, there is no such index within 
bounds whose value is equal to v. As usual in deductive verification, the presence 
of the while loop requires one to supply a loop invariant. Here, it boils down to 
the two invariant clauses, which state the limits of the search space are always 
within the bounds of a and that for every index i for which a. (i) is equal to v, 
then i must be within the limits of the current search space. We also provide a 
decreasing measure (variant) in order to prove loop termination. 

Assuming file binary_search.ml contains the program of Fig. 1, starting a 
proof with Cameleer is as easy as typing cameleer binary_search.ml in a ter- 
minal. Users are immediately presented with the Why3 IDE, where they can con- 
duct the proof. Twelve verification conditions are generated for binary_search: 
two for loop invariant initialization, four loop invariant preservation (two for each 
branch of if..then..else), two for safety (check division by zero and index in 
array bounds), two for loop termination (one for each branch), and finally one 
for each postcondition. All of these are easily discharged by SMT solvers. 


Functor-Based Implementation. The implementation in Fig. 2 depicts (the skele- 
ton of) an alternative implementation of the binary search routine. Instead of 
passing the comparison function as an argument of binary_search, here the 
functor Make takes as argument a module of type OrderedType, which provides 
a monomorphic comparison function over a type t. This is the same approach 
found in the OCaml standard library, namely in the Set and Map modules. The 
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@logic attribute instructs Cameleer that cmp is both a programming and logical 
function. This is what allows us to provide the axiom about the behavior of cmp. 

Other than the call to Ord.cmp, the implementation and specification of 
binary_search does not change, hence we omit it here. When fed into Cameleer, 
the functorial implementation generates the exact same twelve verification con- 
ditions as the higher-order counterpart, all of them easily discharged as well. 
Thus, the use of a functor does not impose any verification burden, showing the 
flexibility of Cameleer to handle different idiomatic OCaml programming styles. 


module type OrderedType = sig 
type t 
val[@logic] cmp: t -> t -> int 
(*@ axiom total_pre_order: is_total_pre_order cmp *) 


end 


module Make (Ord: OrderedType) = struct 
let binary_search a v = 


try while !1 <= !u do 
let c = Ord.cmp a.(m) in 


(*@ i = binary_search av... *) 
end 


Fig. 2. Binary search implemented as a functor. 
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Fig. 3. Cameleer verification workflow. 


3 Implementation 


Cameleer Workflow. Figure 3 depicts the verification workflow of the Cameleer 
tool. We use the GOSPEL toolchain?, in order to parse and manipulate (via the 
ppxlib library) the abstract syntax tree of the GOSPEL-annotated OCaml pro- 
gram. A dedicated parser and type-checker (extended to handle implementation 
features) treat GOSPEL special comments and attach the generated specifica- 
tion to nodes in the OCaml AST. Cameleer translates the decorated AST into an 


? https: //github.com/ocaml-gospel/gospel. 
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equivalent WhyML representation, which is then fed to Why3. The Why3 type- 
and-effect system might reject the input program, in which case the reported 
error is propagated back to the level of the original OCaml code. Otherwise, if 
the translated program fits Why3 requirements, the underlying VCGen computes 
a set of verification conditions that can then be discharged by different solvers. 
Throughout all this pipeline, the user only has to write the OCaml code and 
GOSPEL specification (represented in Fig. 3 as a full-lined box), while every other 
element is automatically generated (dash-lined boxes). The user never needs to 
manipulate or even care about the generated WhyML program. In short, the 
Cameleer user intervenes in the beginning and in the end of the process, i.e., in 
the initial specifying phase and in the last step, helping Why3 to close the proof. 
Our development effort currently amounts to 1.8K non-blank lines of OCaml 
code. 


Translation into WhyML. The core of Cameleer is a translation from GOSPEL- 
annotated OCaml code into WhyML. In order to guide our implementation effort, 
we have defined such a translation as a set of inductive inference rules between 
the source and target languages [26]. Here, rather than focusing on more funda- 
mental aspects, we give a brief overview of how the translation works in practice. 

OCaml and WhyML are both dialects of the ML-family, sharing many syn- 
tactic and semantics traits. Hence, translation of OCaml expressions and decla- 
rations into WhyML is rather straightforward: GOSPEL annotations are readily 
translated into WhyML specification, while supported OCaml programming con- 
structions (including ghost code) are easily mapped into semantically-equivalent 
WhyML constructions. Consider, for instance the following piece of OCaml code: 


type ’a non_empty_list = { self: ’a list } 


(*@ invariant self <> [] *) 


let [@ghost] hd (1: ’a non_empty_list) = match 1 with 
| [] -> assert false 


| xi: ->x 
(*@ r= hd 1 
ensures match 1 with 
| 0) -> false 
| xt: 17> r= x *) 


For such case, Cameleer generates the following WhyML program: 


type non_empty_list ’a = { self: list ’a } 
invariant { self <> Nil } 


let ghost hd (1: non_empty_list ’a) 
returns { r -> match 1 with 
| Nil -> false 
| Cons x _ -> x = r end } 
= match 1 with 
| Nil -> absurd 
| Cons x _ -> x end 
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Other than the small syntactic differences, the generated WhyML program is 
identically to the original OCaml one. In particular, the @ghost annotation gen- 
erates a ghost function in WhyML, while the assert false expression (which is 
treated in a special way by the OCaml type-checker) is translated into the absurd 
construction, with the same semantics. Supplied annotations, in this case post- 
condition and type invariant, are readily mapped into equivalent specification. 

The translation of the OCaml module language is more interesting and 
involved. A WhyML program is a list of modules, a module is a list of top-level 
declarations, and declarations can be organized within scopes, the WhyML unit 
for namespaces management. However, there is no dedicated syntax for functors 
on the Why3 side. These are represented, instead, as modules containing only 
abstract symbols [17]. Thus, when translating OCaml functors into WhyML, we 
need to be more creative. If we consider, for instance, the Make functor from 
Fig. 2, Cameleer will generate the following WhyML program: 


scope Make 
scope Ord 
type t 


val function cmp t t : int 
axiom total_pre_order: is_total_pre_order cmp 
end 


let binary_search av=... 
end 


The functor argument Ord is encoded as a nested scope inside Make. This means 
the binary_search implementation can access any symbol from the Ord names- 
pace, via name qualification (e.g., Ord.t and Ord. cmp). 


Interaction with Why3. One distinguishing feature of the Why3 architecture is 
that it can be extended to accommodate new front-end languages [32, Chap. 4]. 
Building on the devised OCaml to WhyML translation scheme, we use the Why3 
API to build an in-memory representation of the WhyML program. We also 
register OCaml as an admissible input language for Why3, which amounts to 
instructing Why3 to recognize .m1 files as a valid input format and triggering 
our translation in such case. Following this integration, we can use any Why3 
tool, out of the box, to process a .ml file. We are currently using the extract 
and session tools: the latter to gather statistics about number of generated 
verification conditions and proof time; the former to erase ghost code. 


Limitations of Using Why3. The WhyML specification sub-language and 
GOSPEL are similar. Moreover, they share some fundamental principles, namely 
the arguments of functions are not aliased by construction and each data struc- 
ture carries an implicit representation predicate. However, one can use GOSPEL 
to formally specify OCaml programs which cannot be translated into WhyML. 
This is evident when it comes to recursive mutable data structures. Consider, 
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for instance, the cell type from the Queue module of the OCaml standard 
library®: 


type ’a cell = Nil | Cons of { content: ’a; mutable next: ’a cell } 


As we attempt to translate such data type, Why3 emits the following error: 


This field has non-pure type, it cannot be used in a recursive 
type definition 


Recursive mutable data types are beyond the scope of Why3’s type-and-effect 
discipline [14], since these can introduce arbitrary memory aliasing which breaks 
the bounded-mutability principle of Why3 (i.e., all aliases must be statically- 
known). The solution would be to resort to an axiomatic memory model of 
OCaml in Why3, or to employ a richer program logic, e.g., Separation Logic [28] 
or Implicit Dynamic Frames [31]. We describe such an extension as future work 
(Sect. 6). 


4 Evaluation 


In order to assess the usability and performance of Cameleer, we have put 
together a test suite of over 1000 lines of OCaml code. The reported case studies 
are all automatically verified. To build our gallery of verified programs we used 
a combination of Alt-Ergo 2.4.0, CVC4 1.8, and Z3 4.8.6. Figure 4 summarizes 
important metrics about our verified case studies: the number of generated ver- 
ification conditions for each example; the total lines of OCaml code, GOSPEL 
specification, and lines of ghost (these are also included in the number of OCaml 
LOC), respectively; the time it takes (in seconds) to replay a proof; and finally, 
if the proof is immediately discharged, i.e., no extra user effort is required other 
than writing down suitable specification. 

Our test bed includes OCaml implementations issued from realistic and mas- 
sively used programming libraries: the List .fold_left iterator and Stack mod- 
ule from the OCaml standard library; the Leftist Heap implementation from 
ocaml-containers’; finally, the applicative Queue module from OCamlgraph’. 
We have used Cameleer to verify programs of different nature. These include: 
numerical programs (e.g., binary multiplication and fast exponentiation); sorting 
and searching (e.g., binary search and insertion sort); logical algorithms (con- 
version of a propositional formula into conjunctive normal form); array scanning 
(finding duplicate values in an array of integers); small-step iterators; data struc- 
tures implemented as functors (e.g., Pairing Heaps and Binary Search Trees); 
historical algorithms (checking a large routine by Turing, Boyer-Moore’s major- 
ity algorithm, FIND by Hoare, and binary tree same fringe); examples in Rustain 
Leino’s forthcoming textbook “Program Proofs”; and higher-order implementa- 
tions (height of a binary tree computed in CPS). Both small-step iterators and 


3 https: //caml.inria.fr/pub/docs/manual-ocaml/libref/Queue.html. 
a https://github.com/c-cube/ocaml-containers/blob/master/src/core/CCHeap.ml 
5 https://github.com/backtracking/ocamlgraph/blob/master/src/lib/persistentQueue.ml 
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Case study # VCs | LOC / Spec. / Ghost |Proof time|Immediate 
Applicative Queue 23 25 /17/4 1.26 WA 
Arithmetic Compiler 258 235 / AML 155 16.31 x 
Binary Multiplication 12 10 /6/0 0.69 S 
Binary Search 37 62 / 40/0 123 S 
Binary Search Trees 31 20 / 26/0 1.45 x 
Checking a Large Routine 16 25 /15/0 0.75 Sf 
CNF Conversion 93 113 / 47 / 14 2.92 Y 
Duplicates in an Array iil 10/9/0 0.63 Z 
Ephemeral Queue 44 40 / 29/7 1.34 S 
Even-odd Test 6 6/8/0 0.55 v4 
Factorial 8 10/9/0 0.64 Z 
Fast Exponentiation 5 4/5/0 0.62 S 
Fibonacci 15 16 / 15/2 0.64 S 
FIND Algorithm 6 13 /7/0 0.57 WA 
Insertion Sort 17 13 / 34/0 1.28 JY 
Integer Square Root 11 3 / 16 / © 0.63 A 
Leftist Heap 161 99 / 178 / 11 4.33 S 
Mjrty 25 33 / 12/0 2.56 / 
OCaml List.fold_left 28 5 / 21/0 0.79 X 
OCaml Stack 22 25 | Be ft i 0.89 WA 
Pairing Heap 70 65 / 101 / 29 2.30 X 
Program Proofs 63 93 / 54 / 24 1.60 X 
Same Fringe 23 22/16/0 0.78 S 
Small-step Iterators 46 AD) En 2 2.01 X 
Tree Height CPS 4 8/8/00 0.80 Z 
Union Find 67 36 / 29/7 6.19 S 


Fig. 4. Summary of the case studies verified with the Cameleer tool. 


the list-fold function use a modular approach to reason about iteration [18]. 
Our largest case study to date is a toy compiler from arithmetic expressions 
to a stack machine, while Union Find features the most involved, but very ele- 
gant, specification. The former is inspired by the presentation in Nielsons’ text- 
book [25]; the latter follows recently proposed specification techniques [7,12] to 
achieve fully automatic proofs of correctness and termination. 

The runtimes shown in Fig. 4 were measured by averaging over ten runs 
on a Lenovo Thinkpad X1 Carbon 8th Generation, running Linux Mint 20.1, 
OCaml 4.11.1, and Why3 1.3.3 (developer version). They show that Cameleer can 
effectively verify realistic OCaml code in a decent amount of time. Following good 
practices in deductive verification, Cameleer allows the user to write ghost code in 
order to ease proof and specification. The number of lines of ghost code in Fig. 4 
stands for ghost fields in record types, ghost functions, and lemma functions. 
In particular, the arithmetic compiler example uses lemma functions to prove, 
by induction, results about semantics preservation. Finally, case studies marked 
with X required some form of manual interaction in the Why3 IDE [9]. These 
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are very simple proofs by induction (of auxiliary lemmas) and case analysis, in 
order to better guide SMT solvers. 

From our experience developing this gallery of verified programs, we believe 
the required annotation effort is reasonable, although non-negligible. Some case 
studies, namely the Heap implementations, feature a considerable amount of lines 
of GOSPEL specification. However, these are classic definitions (e.g., minimum 
element) and results (e.g., the root of the Heap is the minimum element), which 
are easily adapted to any variant of Heap implementation. 


5 Related Work 


Automated Deductive Verification. One can cite Why3, F* [1], Dafny [23], and 
Viper [24] as successful automated deductive verification tools. Formal proofs are 
conducted in the proof-aware language of these frameworks, and then executable 
reliable code can be automatically extracted. In the Cameleer project, we chose 
to develop a verification tool that accepts as input a program written directly in 
OCaml, instead of a dedicated proof language. This obviates the need to re-write 
entire OCaml codebases (e.g., libraries), just for the sake of verification. 

Regarding tools that tackle the verification of programs written in main- 
stream languages, one can cite Frama-C [21] (for the C language), VeriFast [20] 
(C and Java), Nagini [10] (Python), Leon [22] (Scala), Spec# [3] (C#), and 
Prusti [2] (Rust). Despite the remarkable case studies verified with these tools, 
programs written in the these languages can quickly degenerate into a night- 
mare of pointer manipulation and tricky semantics issues. We argue the OCaml 
language presents a number of features that make it a better target for formal 
verification. 

Finally, language-based approaches offer an alternative path to the verifica- 
tion of software. Liquid Haskell [34] extends standard Haskell types with Liquid 
Types [29], a form of refinement types [80], in order to prove properties about 
realistic Haskell programs [33]. In this approach, verification conditions are gen- 
erated and discharged during type-checking. This is also its major weakness: in 
order to remain decidable, the expressiveness of the refinement language is hin- 
dered. In Cameleer, the use of GOSPEL allows us to provide rich specification to 
relevant case studies, while still achieving good proof automation results. 


Deductive Verification of OCaml Programs. Prior to our work, CFML [4] and 
coq-of-ocaml1 [8] were the only available tools for the deductive verification of 
OCaml-written code, via translation into the Coq proof language. On one hand, 
CFML features an embedding of a higher-order Separation Logic in Coq, together 
with a characteristic formulae generator. On the other hand, coq-of-ocaml 
compiles non-mutable OCaml programs to pure Gallina code. These two tools 
have been successfully applied to the verification of non-trivial case studies, such 
as the correctness and worst-case amortized complexity bound of cycle detection 
algorithm [19], as well as part of the Tezos’ blockchain protocol®. However, they 


ê https: //clarus.github.io/coq-of-ocaml/examples/tezos/. 
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still require a tremendous level of expertise and manual effort from users. Also, no 
behavioral specification is provided with the OCaml implementation. The user 
must write specification at the level of the generated code, which breaks our 
vision that implementation and specification must coexist and evolve together. 

The VOCaL project aims at developing a mechanically verified OCaml 
library [6]. One of the main novelties of this project is the combined use of 
three different verification tools: Why3, CFML, and Coq. The GOSPEL specifi- 
cation language was developed in the scope of this project, as a tool-agnostic 
language that could be manipulated by any of the three mentioned frameworks. 
Up to this point, the three mentioned tools were only using GOSPEL for inter- 
face specification, and not as a proof language. We believe the Cameleer approach 
nicely complements the existing toolchains [13] in the VOCaL ecosystem. 


6 Conclusions and Future Work 


In this paper we presented Cameleer, a tool for automated deductive verification 
of OCaml programs, with bounded mutability. We use the recently proposed 
GOSPEL language, which we also extended in the scope of this work, in order to 
attach formal specification to an OCaml program. Cameleer fulfills a gap in the 
OCaml community, by providing programmers with a tool to directly specify and 
verify their implementations. By departing from the interactive-based approach, 
we believe Cameleer can be an important step towards bringing more OCaml 
programmers to include formal methods techniques in their daily routines. 

The core of Cameleer is a translation from OCaml annotated code into 
WhyML. The two languages share many common traits (both in their syntax and 
semantics), so it makes sense to target this intermediate verification language in 
the first major iteration of Cameleer. We have successfully applied our tool and 
approach to the verification of several case studies. These include implementa- 
tions issued from existing libraries, and scale up to data structures implemented 
as functors and tricky effectful computations. In the future, we intend to apply 
Cameleer to the verification of even larger case studies. 


What We Do Not Support. Currently, we target a subset of the OCaml language 
which roughly corresponds to caml-light, with basic support for the module 
language (including functors). Also, WhyML limits effectful computations to the 
cases where alias is information statically known, which limits our support for 
higher-order functions and mutable recursive data structures. Adding support for 
the objective layer of the OCaml language would require a major extension to the 
GOSPEL language and a redesign of our translation into WhyML. Nonetheless, 
Why3 has been used in the past to verify Java-written programs [15], so in 
principle an encoding of OCaml objects in WhyML is possible. 

We do not support some of the more advanced type features in OCaml, namely 
Generalized Algebraic Data Types (GADTs) and polymorphic variants. One 
way to support such constructions would to be extend the type system of Why3 
itself, which would likely mean a considerable redesign of the WhyML language. 
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Another possible route is to extend the core of Cameleer with the ability to 
translate OCaml code into other, richer, verification frameworks. 


Interface with Viper and CFML. In order to augment the class of OCaml programs 
we can treat, we plan on extending Cameleer to target the Viper infrastructure 
and the CFML tool. On one hand, Viper is an intermediate verification language 
based on Separation Logic but oriented towards SMT-based software verification, 
allowing one to automatically verify heap-dependent programs. On the other 
hand, the CFML tool allows one to verify effectful higher-order programs. We 
plan on extending the CFML translation engine, in order to take source-code 
level GOSPEL annotations into account. Since it targets the rich proof language 
and type system of Coq, it can in principle be extended to reason about GADTs 
and other advanced OCaml features. Even if it relies on an interactive proof 
assistant, CFML provides a comprehensive tactics library that eases proof effort. 

Our ultimate goal is to grow Cameleer to a verification tool that can simul- 
taneously benefit from the best features of different intermediate verification 
frameworks. Our motto: we want Cameleer to be able to verify parts of OCaml 
code using Why3, others with Viper, and some very specific functions with CFML. 
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Abstract. Multi-threaded unit tests for high-performance thread-safe 
data structures typically do not test all behaviour, because only a single 
scheduling of threads is witnessed per invocation of the unit tests. Model 
checking such unit tests allows to verify all interleavings of threads. These 
tests could be written in or compiled to LLVM IR. Existing LLVM IR 
model checkers like DIVINE and Nidhugg, use an LLVM IR interpreter 
to determine the next state. This paper introduces LLMC, a multi-core 
explicit-state model checker of multi-threaded LLVM IR that translates 
LLVM IR to LLVM IR that is executed instead of interpreted. A test 
suite of 24 tests, stressing data structures, shows that on average LLMC 
clearly outperforms the state-of-the-art tools DIVINE and Nidhugg. 


1 Introduction 


High-performance software often uses thread-safe data structures to allow mul- 
tiple threads access to the data, without corrupting it. Unit tests for such data 
structures typically do not test all behaviour, because the thread scheduler of 
the run-time environment non-deterministically chooses only a single interleav- 
ing. Thus, only a single trace is witnessed each time the unit test is invoked. If 
we would model check [1] these unit tests, we can witness all possible traces by 
exploring all thread schedules. Because it does not depend on the run-time envi- 
ronment, model checking can become part of a continuous integration pipeline, 
enabling push-button verification of multi-threaded software. 

These thread-safe data structures can be written in or compiled to LLVM IR, 
the intermediate representation of the LLVM Project [2]. The LLVM Project is 
a collection of modular and reusable compiler and toolchain technologies. Many 
front-ends for LLVM IR exist, for example for C, C++, Java, Ruby, and Rust, 
potentially allowing an LLVM IR model checker to be usable for many languages. 


1.1 Related Work 


Model checkers that operate on LLVM IR already exist, for example DIVINE, 
Nidhugg, RCMC and LLBMC. DIVINE [3] is a stateful multi-core model checker 
of multi-threaded LLVM IR. It has many features such as capturing I/O during 
model checking, SC and TSO memory models, library support such as libc and 
libpthread. Input programs are linked with DIVINE’s operating system layer, 
DiOS, and are interpreted as a whole on the DiVM virtual machine. 


© The Author(s) 2021 
A. Silva and K. R. M. Leino (Eds.): CAV 2021, LNCS 12760, pp. 690-703, 2021. 
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DIVINE detects memory operations to thread-private memory, by traversing 
the heap on-the-fly and recognizing if a memory-object is either known only to 
one thread or to multiple [4]. In the former case, memory operations to that 
memory-object can be collapsed, i.e. joined with the previous instruction. 

Nidhugg [5] is a stateless multi-core model checker of multi-threaded LLVM 
IR that uses an LLVM IR interpreter. It features a sophisticated partial-order 
reduction, rfsc [6], that categorizes traces according to which read reads from 
which write and traverses only one trace in each category. In practice this reduc- 
tion is quite powerful. However, Nidhugg comes with a caveat: because Nidhugg 
is stateless, common prefixes of traces are traversed once per trace instead of 
once in total. This down-side of a stateless approach becomes more pronounced 
with longer and more often occurring common traces. Moreover, Nidhugg might 
not terminate in the presence of infinite loops. 

RCMC [7| is also a stateless LLVM IR model checker. During execution within 
its LLVM IR interpreter, it keeps track of a happens-before graph of all observed 
memory operations. Using this, RCMC can determine the possible values a read 
can observe, without simply executing all interleavings of all threads. Unlike 
Nidhugg, it does not support heap memory and is only released in binary form. 

CBMC [8] is a bounded model checker for C and C++ programs, using SMT 
solving to check for memory safety, exceptions, undefined behaviour and asser- 
tions. Loops and recursion are a problem for CBMC when their bound cannot 
be determined: one needs to set an upper bound on the number of unwindings. 

LLBMC [9] is similar to CBMC, using SMT-solving to find bugs, but only 
for single-threaded C/C++ programs and it operates on LLVM IR. 

Other, less related tools include SMACK [10], SeaHorn [11] and KLEE [12]. 


1.2 Contribution 


This paper introduces LLMC 0.2, a stateful multi-core model checker of multi- 
threaded LLVM IR. Instead of using an LLVM IR interpreter like DIVINE, 
Nidhugg and LLMc 0.1 [13], it transforms input LLVM IR to LLVM IR that 
implements the DMC API, the next-state interface to the model checker DMC [14]. 
We call this transformation process LL2DMC and combined with DMC (Fig. 
1), it allows for up to three orders of magnitude higher throughput (states/s) 
than DIVINE. At present, LLMC lacks sophisticated state space reductions, caus- 
ing state space sizes of roughly two orders of magnitude larger than DIVINE. We 
compared LLMC to DIVINE and Nidhugg using a test suite covering various data 
structures. Overall, despite the lack of sophisticated reductions, LLMC is on aver- 
age an order of magnitude faster than DIVINE and ~3.8x faster than Nidhugg. 
Additionally, LLMC is able to compute the state spaces of the tests where DIVINE 
or Nidhugg fail. 


Fig. 1. The flow of how an LLVM IR input program is verified in LLMC. 
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2 LLMC: Low-Level Model Checker 


This section explains how the transformation process (LL2DMC) transforms the 
input LLVM IR of a program to LLVM IR that implements the DMC API. LLMC 
supports LLVM IR compiled from C and C++, by handling a number of builtins 
(e.g. __atomic_* for atomic memory operations), part of libpthread (for thread 
support), libc (e.g. for memory allocation) and global constructors. 


2.1 DMC Model Checker 


The model created by LL2DMC is given to DMC 
to explore. DMC interacts with the model via the 
DMC API (NEXTSTATE API and DTREE API com- 
bined) as illustrated in Fig. 2: after requesting 
the initial state from the model, DMC continues 
to request successor states, until the state space 
has been generated. A state is a vector of 32-bit 
integers; two states need not be of the same length. 
The states are stored in the concurrent com- 
pression tree DTREE [14], allowing lossless com- 
pression, fast insertion and duplicate detection of Fig. 2. DMC model checker 
states. When inserted, states are given a unique 
StateID. A StateID can be stored in states as 
well, thus allowing the creation of a DAG of states: a root-state and sub-states. 
Additionally, DTREE allows incremental updates to a state, without having the 
actual contents of the state and it allows partial reconstruction of states. This 
delta interface uses the StateID to identify states and can avoid needless copy- 
ing of entire states, increasing performance. DMC exposes these DTREE features 
as part of the DMC API [14]. 


NS API 


Search 
Core DTREE API 


2.2 Input Language to LLZDMc: LLVM IR 


To understand how LLMC handles input LLVM IR [2], we briefly explain it here. 
LLVM IR supports control flow by way of basic blocks. Basic blocks are a list of 
instructions that execute sequentially. The last instruction of a basic block is a 
terminator instruction, such as a branch (jump) instruction or return statement. 

LLVM IR uses single static assignment form for register values. To support 
data flow depending on control flow, ¢-nodes exist. These nodes are instructions 
at the beginning of a basic block that take a value depending on the basic block 
from which was jumped to the basic block containing the ¢-nodes. 


2.3 Output of LL2ZDMC: Model Implementing DMC API 


The output of LL2DMC is a model that implements the NEXTSTATE API part 
of the DMC API of the model checker DMC [14]. The NEXTSTATE API requires 
two interfaces from a model: one to communicate the initial state and one to 
generate next states, given a state. 
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The initial state of a model generated by LL2DMC is as if one just started 
the program: registers are unused, global memory is initialized to 0 and a call to 
the global constructor (@1lvm. global_ctors) is set up. Global constructors are 
functions that are called before main, which are used to initialize memory and 
miscellaneous initialization, such that the executable is set up properly before 
main is invoked. Having the initial state in this manner, allows the global con- 
structor to be part of the state space and thus be checked as well. 

Starting with the initial state, DMC will keep asking the model to generate 
the next states for a given state, by invoking the nezt-state interface of the 
model, until there are no more new states of which to request next states. Given 
a state, the next-state interface determines the states reachable from that state. 
In the case of a model generated by LL2DMC, first the global constructors of the 
modelled program are explored, thus faults in global constructors are detected. 
When the global constructors are completed, a call to main is set up. At this 
point, the exploration is performed until no new states are visited. 


2.4 State Space Exploration 


This section describes the next-state function and how it is generated from LLVM 
IR. Figure 3 describes what a state looks like. A state contains information not 
unlike what an operating system keeps track of [15]. All instructions are mapped 
to a unique index, such that the [Pc] (program counter) uniquely identifies the 
current position in code. The field [Thread Results| holds the return values of 
finished threads; the field |#threads| specifies the number of threads in the current 
state. The remainder of the state constitutes a list of per-thread data. 

Each thread has its own [PC] and can independently manipulate it by function 
calls or branching. [Status] fields are used to indicate whether the thread/program 
is running, done or failed. Each thread has its own set of Registers], the current 
state of LLVM IR registers. The size of |Registers| is determined by the function 
requiring the largest number of LLVM IR registers. Function calls manipulate 
these registers and the list of stack frames described by [Previous frame). 

A |Field] is a StateID to asub-state, as described in Sect. 2.1. The separation 
into a root-state and sub-states allows sub-states to grow and the state storage 
component of DMC, DTREE, to compress them using tree compression [14]. It also 
allows the use of the delta interface: a write to memory can be simply translated 
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Fig. 3. A description of the state used by LLMC. 
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to a single, efficient call, taking the current |Memory| index, the offset to write to 
and the new data. The resulting index can be written to /Memory}. 

A single LLVM IR instruction in the program is translated to many LLVM IR 
instructions in the model. We will distinguish LLVM IR registers in the model 
from registers in the source program by calling the former model-registers. In 
general, a single LLVM IR instruction is translated to a single step with three 
phases: In the Preamble phase, operands to the source LLVM IR instruction 
are remapped to model-registers and loaded from |Registers| or |Memory|. In the 
Action phase, the source LLVM IR instruction is cloned, with the operands 
remapped to the LLVM IR model-registers set up during the Preamble phase. 
In the Epilogue phase, if the source LLVM IR instruction assigns a value to a 
register, the value returned by the cloned instruction is written to |Registers|. 

Listing 1 illustrates how a step is performed as part of the next-state function. 
Multiple steps can be performed as part of the same transition (line 8), as long 
as the changes are local to the thread (line 4). This is explained in more detail 
in Sect. 2.5. The step function is called for every thread in the state vector. 


2.4.1 Register Manipulation 

Note that the |Registers| are not separated into a sub-state, like |Memory|. We 
chose this such that simple register manipulating LLVM IR instructions would 
have no need for an indirection and directly translate to an identical instruction, 
with its operands mapped such that they are loaded from the |Registers| and the 
return value of the instruction written back to the corresponding register. This 
allows us to trivially collapse such instructions, combining the Preamble phases, 
requiring dependencies only to be loaded once. 


2.4.2 Memory Instructions 

Memory instructions such as loads and stores can be directly mapped to the 
delta interface, reading or writing only a part of the |Memory| sub-state. There is 
no distinction between memory allocated on the stack (alloca) and on the heap 


Listing 1 In the next-state function, the step function is called for each thread. 


1 void step(StateVector sv, int threadID) 

2 bool onlyLocal = true; # true while handling commutative instructions 
3 bool emit = false; # set to true when new state is to be emitted 
4 while(sv.threads[threadID].pc > 0 and onlyLocal) 

5 switch(sv.threads[threadID] . pc) 

6 case 0: break; # not running, do nothing 

7 case SomePC: # PC of first instruction of group 

8 # statically collapsed instructions: preamble, action, epilogue 
9 # sv.threads[threadID].pc, onlyLocal and emit may change 

10 aes 

11 if (emit) MC.insert(sv); # emit new state if needed 
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(malloc): both allocate memory by growing the [Memory] sub-state. The returned 
pointer describes which thread created the memory and the offset within the 
sub-state. Any thread can write to and read from any such memory location. 
At present, memory cannot be freed, so free has no effect. Because of the tree 
compression, this has no detrimental effect on memory usage, but does mean 
LLMC currently does not detect free-related bugs. 


2.4.3 Branching, Function Calls and Threading 

To support control flow in LLMC, the [PC] can be changed to the index assigned to 
the first instruction in the target basic block. If the target basic block contains 
g-nodes, those registers are updated to the value corresponding to the basic 
block we are branching from. 

Function calls set up a new stack frame with the current /Registers|, |PC| and 
where to write the return value, then pushes it to the linked list of frames pointed 
to by |Previous frame]. A return from a function pops the top frame from the 
list of frames, copies the |Registers] into the state vector, updates the |PC] and 
writes the return value into the right register. There is no bound on the number 
of frames; the last frame has |Previous frame] set to 0, indicating no next frame. 

Threads are created (pthread_create) by enlarging the root state with 
enough space to fit another thread and incrementing |#threads|. When a thread 
is done, it is marked as such, but not removed from the state vector. This is to 
retain the memory allocated by a thread. Due to the compression of DTREE, it has 
little impact on the memory foot print of the state space. The return value from 
the thread is added to [Thread results|, where it can be read (pthread_join). 


2.5 State Space Reduction 


Instructions that only have an effect local to a thread do not change the 
behaviour of another thread. Such instructions are commutative; their respective 
ordering is not relevant. Thus, such instructions can be collapsed with the pre- 
vious or next instruction. For example, instructions that read and write only to 
registers of a thread are local instructions and do not influence another thread. 
Branching and function calls are other such commutative instructions. 

LLMC collapses commutative instructions statically as well as dynamically. The 
latter is needed to collapse instructions after conditional control flow, because 
statically the condition is unknown. On-the-fly, the condition is evaluated, the 
branch taken and it is determined if the next instruction can be collapsed. 


2.5.1 Thread-Private Memory LLMC collapses all such commutative 
instructions, with the important exception of memory operations on memory 
only accessible to the current thread (memory operations to memory accessible 
to other threads are never collapsed). This requires knowledge on what memory 
each thread can access, which LLMC currently does not track. DIVINE imple- 
ments [4] this by traversing the memory graph in every state, using a run-time 
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type system to identify pointers and how to follow them (edges); each allocation 
yields a node. 

Nidhugg uses a partial-order-reduction [6] that takes into account from which 
write a value read by a read originates. In this process, memory operations to 
thread-private memory are indeed collapsed, because a read can read only a 
single value: the last value written by the thread itself. The current version of 
LLMC does not feature an on-the-fly state space reduction for memory operations. 
Instead, we preprocess the input LLVM IR and statically annotate memory oper- 
ations that cannot be proven to be local to a thread. While this does reduce the 
state space, because many operations are to stack variables that remain thread- 
private, it can only approach the on-the-fly reductions of DIVINE and Nidhugg. 


3 Evaluation 


Table 1 shows a feature comparison between the tools mentioned in Sect. 1.1. 
The table shows that RCMC and CBMC do not support dynamic memory in 
the presence of multiple threads. This limits their usability for our use case, 
model checking multi-threaded tests of data structures, since numerous thread- 
safe data structures use dynamic memory. Furthermore, RCMC, CBMC and 
LLBMC do not support infinite loops and only have limited support for spin- 
locks. More complex infinite loops like appending a new node in the Michael- 
Scott queue [17] using compare-and-swap are not supported. Thus, we focus on 
an experimental comparison between LLMC, DIVINE and Nidhugg on execution 
time, memory footprint of the state space and scalability across multiple threads, 
since all three tools support using multiple threads for model checking. 


Table 1. A feature comparison between the tools mentioned in Sect. 1.1. 
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® Models [16]: S) Sequentially consistent; T) TSO; P) PSO; W) POWER; A) ARM. 
Not supported in combination with threads. 

© Only trivial spin-locks are supported. 

d Threads within global constructors not supported. 
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We ran our experiments on a Dell R930 with 4 E7-8890-v4 CPUs totaling 96 
cores and 2 TiB RAM. All sources were compiled using GCC 9.3.0. 


3.1 Test Suite 


We tested the tools using four real-world concurrent LLVM IR data structures, 
one concurrent algorithm and one protocol. Sources for all tests are available 
online!. We instantiate the tests with various combinations of threads and num- 
ber of elements inserted, processed or dequeued. All combinations are listed 
later, in Table 2. These six tests cover different classes of problem types, differ- 
ent shapes of state spaces, and serve to illustrate the strengths and weaknesses 
of the tools: 


— SortedLinkedList © illustrates a concurrency problem where a number of 
elements are inserted by a number of threads, with a single outcome: all paths 
converge to one state. Elements can be inserted throughout the chain. 

— LinkedList W, similar to SortedLinkedList ®, but with various outcomes, 
because the list is not sorted. It has high contention on the head of the chain. 

— Prefixsum ® is a concurrent approach to determine all sums up to any index 
in an array. It highlights the ability of the model checker to determine thread- 
private memory, because the two-pass prefixsum algorithm actually partitions 
the problem into separate per-thread problems that require no communication 
and one single-threaded part. 

— Hashmap ^ illustrates a concurrency problem where a key is inserted using 
compare-and-swap, followed by either atomically storing the value or busy- 
waiting on the value, if the key already exists (findOrPut [18]). The latter 
involves atomically loading the value until a non-empty value is loaded. 

— MSQ A is the well-known Michael-Scott queue [17]. It is similar to LinkedList 
E, with the addition of dequeue operations, which may return nothing when 
the queue is empty. The dequeuer can be made blocking by calling dequeue 
until it successfully dequeues an element; this is done in A and A. 

— Philosophers ¥ is the Dining Philosophers Problem [19], a commonly used 
protocol to illustrate issues in concurrent resource management. It involves 
P philosophers and P forks; each philosopher grabs their left fork, then the 
right, then puts the right fork back, then the left. This is repeated R times. 
The crux is that each fork is a shared resource for two philosophers. For our 
tests suite, this illustrates contention on multiple elements in a single array. 


These tests highlight the strengths and weaknesses of each tool using real- 
world data structures and algorithms. The well-known Michael-Scott queue A 
for example is used in many software packages. They reflect different kinds of 
state spaces: LinkedList E focuses on “wide” state spaces, with many end states; 
SortedLinkedList ® examples state spaces that go wide, but converge into a 
single end state; Prefixsum ® highlights the model-checker’s ability to detect 
thread-local memory: model checkers that can detect this have a narrow state 
space, otherwise a model checker will explore all interleavings. 


1 https: //github.com/bergfi/Ilmc/tree/cav2021 /tests/performance. 
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3.2 Observations and Considerations 


For each model, we verified that all expected end states were reachable. For 
example for M, we manually verified that all 8!/(4!4!) = 70 possible outcomes of 
the linked list were generated. 

We witnessed DIVINE returning varying state space sizes across different runs 
on the same test when using multiple threads, indicating a concurrency problem. 
It also occasionally crashed, most often when using 192 threads. Even though this 
indicates the answers DIVINE gives might not be correct, we opted to include the 
results, assuming they would at least provide an indication of the performance. 

Furthermore, we did run RCMC on a number of tests. RCMC often runs out 
of memory before crashing; likely the result of an infinite loop. For even some 
small tests, it could not finish within 100x the time other tools needed. 


3.3 Experimental Results 


Figure 4 shows the results of LLMC compared to DIVINE on state space explo- 
ration time (4a) and Nidhugg on wall-clock time (4b) when applied to the models 
from Table 2. These graphs indicate relative performance: the uppermost (blue) 
line for example indicates the line where LLMC is 100x faster. Figure 4c compares 
LLMC (lower data points) and DIVINE (upper data points) on the memory com- 
pression of the state spaces they generate. Figure 4d compares LLMC (upper data 
points) and DIVINE (lower data points) on the throughput of states per second. 


3.3.1 LLMC vs DIVINE 

Looking at the results in Fig. 4a, we see that LLMC outperforms DIVINE by at 
least 5x in all test cases except Prefixsum ® and two SortedLinkedList ® tests. 
LLMC suffers in the Prefixsum ® tests because of the lack of dynamic thread- 
private memory detection. This results in significantly larger state spaces, up to 
three orders of magnitude for @, as seen in Fig. 4c. 

Comparing the sorted ® and non-sorted Œ linked list cases, we notice LLMC 
is able to outperform DIVINE in the non-sorted cases by higher factors than the 
sorted cases. This difference can be explained by that the two tools generate more 
similarly sized state spaces for non-sorted HM cases, but not for sorted ® cases. 
For example, LLMC generates ~14.4x more states than DIVINE for ®, but only 
~2.2x more for Æ. This highlights LLMC is lacking a reduction technique, which 
works for DIVINE in the sorted cases, but not as well for the non-sorted cases. 

For the two Hashmap ° cases that both tools completed, LLMC outperforms 
DIVINE by 8.4x and 157x. Since the hash map is a single global memory object all 
threads can access, LLMC does not have the disadvantage of lacking a dynamic 
thread-private memory reduction. DIVINE crashed for the two other ô test cases. 

DIVINE is unable to complete two of the four Michael-Scott queue A tests, 
crashing out, the others are verified 86x and 272x faster by LLMC than by DIVINE. 

As the complexity of the Philosopher ‘ test cases increases, LLMC increasingly 
outperforms DIVINE. The two tools generate similarly sized state state spaces, 
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Fig. 4. All experimental results, see Table 2 for a legend. Results above the DNF line 
mean the tool on the y-axis Did Not Finish, not supporting the test. 


because the high contention leaves relatively few memory instructions to be 
collapsed by DIVINE’s reduction, thus levelling the playing field. 

In summary, LLMC is able to outperform DIVINE in most of the test cases, 
mostly between 10x-100x faster, with an outlier as high as 2450x faster (*). 
This highlights the performance difference, as on average LLMC visits ~1.4M 


Table 2. The six tests with various combinations of number of threads and ele- 
ments, totaling 24 input programs. MSQ A configurations describe a combination of 
Enqueuers and ([B]locking) Dequeuers in parallel (||) and sequential (;). 
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states per second (~8.5M states/s for ¥), where DIVINE visits ~4k states per 
second (Fig. 4d). 


3.3.2 LLMC vs Nidhugg 

Moving on to Fig. 4b, we notice Nidhugg is unable to complete any of the 
Michael-Scott queue A, Hashmap ô or Philosopher Y test cases. This is because 
Nidhugg supports neither the __atomic_* instructions needed for the Michael- 
Scott queue A nor the spin-lock used in the Hashmap ê and Philosopher Y 
tests. We tried Nidhugg’s transformation capabilities to transform the spin-lock 
to an assume statement, thus limiting the traces traversed to the ones where 
the condition of the spin-lock holds, but the generated LLVM IR was invalid 
and could not be used. Additionally, we tried an experimental version (7b8be8a) 
with a changelog containing potential fixes to no avail. 

We see that Nidhugg outperforms LLMC in the Prefixsum ® test cases con- 
sistently by multiple orders of magnitude: Nidhugg traverses only a single trace 
for each of these test cases. This highlights the strength of Nidhugg in its ability 
to conclude that each read can only read a single value. Without this technique, 
LLMC needs to exhaustively go through all interleavings of the threads. 

For the linked list, sorted ® and non-sorted M, we see that as the cases get 
bigger, LLMC is able to outperform Nidhugg. This highlights the disadvantage 
of stateless model checking: bigger state spaces tend to cause more common 
prefixes of paths, which causes more work for stateless model checking. 


3.3.3 Scalability 

Figure 5 shows the results for various num- a 
ber of threads for SortedLinkedList3.9 ®, N 
chosen for the performance similarity of ] N 
the three tools. The graph shown is typ- 
ical: other test expose similar patterns as 
the one we highlight here. DIVINE does 
not scale well in the number of threads: : 
its peak performance lies typically around 30) eee 

4 or 8 threads, confirmed by the DIVINE D 
developers?. Nidhugg expectedly does scale ae R T624 A8 O6 i 
very well, as threads just execute a spe- Threads 

cific trace, with hardly and communication. 
LLMC shows some scalability, but a ~4x 
improvement using 192 threads leaves a lot 
of room for improvement’. 
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Fig. 5. Scalability comparison of 
DIVINE X, LLMC A, Nidhugg @. 


? https: //divine.fi.muni.cz/trac/ticket /44. 
3 https: //github.com/bergfi/dmc/issues/1. 
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3.3.4 DMC and DTREE 

We highlight one aspect of the performance of LLMC: the underlying model 
checker DMC and its storage component DTREE [14]. In Figure 4c, we notice 
that although LLMC on average generates state spaces of an order of magnitude 
larger compared to DIVINE, it uses two orders of magnitude less memory per 
state, due to DTREE. Furthermore, DTREE allows to apply a delta to a state 
without reconstructing the entire state. Since states are typically ~2kiB in these 
tests, this significantly avoids copying memory and increases performance. 


4 Conclusion 


We have introduced LLMC 0.24, the multi-threaded low-level model checker that 
model checks software via LLVM IR. It translates the input LLVM IR into a 
model LLVM IR that implements the DMC API, the API of the high-performance 
model checker DMC. This allows LLMC to execute the model’s next-state func- 
tion, instead of interpreting the input LLVM IR, like DIVINE and Nidhugg. We 
compared LLMC to these tools using a test suite of 24 tests, covering various 
data structures. LLMC outperforms DIVINE and Nidhugg up to three orders of 
magnitude, while other tests have shown areas for improvement. Averaging the 
results of all completed tests, LLMC is an order of magnitude faster than DIVINE 
and ~3.4x faster than Nidhugg. DIVINE and Nidhugg are unable to complete 4 
and 12 tests, respectively, due to crashing or not supporting infinite loops or 
__atomic_* library calls. 


Future Work. LLMC will benefit most from a state space reduction technique that 
collapses memory instructions to thread-private memory. We aim to integrate 
this as part of a memory emulation layer that also adds support for relaxed 
memory models. Even without the dynamic reduction technique, the results 
show that LLMC in its current form is a high performing tool to model check 
software. 
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Abstract. A program verifier produces reliable results only if both the 
logic used to justify the program’s correctness is sound, and the imple- 
mentation of the program verifier is itself correct. Whereas it is common 
to formally prove soundness of the logic, the implementation of a veri- 
fier typically remains unverified. Bugs in verifier implementations may 
compromise the trustworthiness of successful verification results. Since 
program verifiers used in practice are complex, evolving software systems, 
it is generally not feasible to formally verify their implementation. 

In this paper, we present an alternative approach: we validate suc- 
cessful runs of the widely-used Boogie verifier by producing a certificate 
which proves correctness of the obtained verification result. Boogie per- 
forms a complex series of program translations before ultimately generat- 
ing a verification condition whose validity should imply the correctness 
of the input program. We show how to certify three of Boogie’s core 
transformation phases: the elimination of cyclic control flow paths, the 
(SSA-like) replacement of assignments by assumptions using fresh vari- 
ables (passification), and the final generation of verification conditions. 
Similar translations are employed by other verifiers. Our implementa- 
tion produces certificates in Isabelle, based on a novel formalisation of 
the Boogie language. 


1 Introduction 


Program verifiers are tools which attempt to prove the correctness of an imple- 
mentation with respect to its specification. A successful verification attempt is, 
however, only meaningful if both the logic used to justify the program’s correct- 
ness is sound, and the implementation of the program verifier is itself correct. It 
is common to formally prove soundness of the logic, but the implementations of 
program verifiers typically remain unverified. As is standard for complex software 
systems, bugs in verifier implementations can and do arise, potentially raising 
doubts as to the trustworthiness of successful verification results. 


© The Author(s) 2021 
A. Silva and K. R. M. Leino (Eds.): CAV 2021, LNCS 12760, pp. 704-727, 2021. 
https: / /doi.org/10.1007/978-3-030-81688-9_33 
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One way to close this gap is to prove a verifier’s implementation correct. 
However, such a once-and-for-all approach faces serious challenges. Verifying 
an existing implementation bottom-up is not practically feasible because such 
implementations tend to be large and complex (for instance, the Boogie ver- 
ifier [29] consists of over 30K lines of imperative C7 code), use a variety of 
libraries, and are typically written in efficient mainstream programming lan- 
guages which themselves lack a formalisation. Alternatively, one could develop 
a verifier that is correct by construction. However, this approach requires the 
verifier to be (re-)implemented in an interactive theorem prover (ITP) such as 
Coq [14] or Isabelle [24]. This precludes the free choice of implementation lan- 
guage and paradigm, exploitation of concurrency, and possibility of tight inte- 
gration with standard compilers and IDEs, which is often desirable for program 
verifiers [4,5, 13,26]. Both verification approaches substantially impede software 
maintenance, which is problematic since verifiers are often rapidly-evolving soft- 
ware projects (for instance, the Boogie repository [1] contains more than 5000 
commits). 

To address these challenges, in this work we employ a different approach. 
Instead of verifying the implementation once and for all, we validate specific 
runs of the verifier by automatically producing a certificate which proves the 
correctness of the obtained verification result. Our certificate generation formally 
relates the input and output of the verifier, but does so largely independently of 
its implementation, which can freely employ complex languages, algorithms, or 
optimisations. Our certificates are formal proofs in Isabelle, and so checkable by 
an independent trusted tool; their guarantees for a certified run of the verifier 
are as strong as those provided by a (hypothetical) verified verifier. 

We apply our novel verifier validation approach to the widely-used Boogie 
verifier, which verifies programs written in the intermediate verification language 
Boogie. The Boogie verifier is a verification condition generator: it verifies pro- 
grams by generating a verification condition (VC), whose validity is then dis- 
charged by an SMT solver. Certifying a verifier run requires proving that valid- 
ity of the VC implies the correctness of the input program. Certification of the 
validity-checking of the VC is an orthogonal concern; our results can be combined 
with work in that area [11,15,19] to obtain end-to-end guarantees. 

Like many automatic verifiers, Boogie is a translational verifier: it performs 
a sequence of substantial Boogie-to-Boogie translations (phases), simplifying the 
task and output of the final efficient VC computation [6,18]. The key challenges 
in certifying runs of the Boogie tool are to certify each of these phases, includ- 
ing final VC generation. In particular, we present novel techniques for making 
the following three key phases (and many smaller ones) of Boogie’s tool chain 
certifying: 


1. The elimination of loops (more precisely, cycles in the CFG) by reducing the 
correctness of loops to checking loop invariants (CFG-to-DAG phase) 

2. The replacement of assignments by (SSA-style) introduction of fresh variables 
and suitable assume statements (passification phase) 
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3. The final generation of the VC, which includes the erasure and logical encod- 
ing of Boogie’s polymorphic type system [33] (VC phase). 


The certification of such verifier phases is related to existing work on com- 
piler verification [34] and validation [8, 41,42]. However, the translations and the 
certified property we tackle here are fundamentally different from those in com- 
pilers. Compilers typically require that each execution of the target program 
corresponds to an execution of the source program. In contrast, the encoding of 
a program in a translational verifier typically has intentionally more executions 
(for instance, allows more non-determinism). Moreover, translational verifiers 
need to handle features not present in standard programming languages such as 
assume statements and background theories. Prior work on validating such veri- 
fier phases has been limited in the supported language and extent of the formal 
guarantee; we discuss comparisons in detail in Sect. 8. 


Contributions. Our paper makes the following technical contributions. 


1. The first formal semantics for a significant subset of Boogie (including axioms, 
polymorphism, type constructors), mechanised in Isabelle. 

2. A validation technique for two core program-to-program translations occur- 
ring in verifiers (CFG-to-DAG and passification). 

3. A validation technique for the VC phase, handling polymorphism erasure and 
Boogie’s type system encoding [31], for which no prior formal proof exists. 

4. A version of the Boogie implementation that produces certificates for a sig- 
nificant subset of Boogie. 


Making the Boogie verifier certifying is an important result, reducing the 
trusted code base for a wide variety of verification tools implemented via encod- 
ings into Boogie, e.g. Dafny [31], VCC [13], Corral [28], and Viper [35]. Moreover, 
the technical approach we present here can in future be applied to the certifica- 
tion of the translations performed by these tools, and those based on comparable 
intermediate verification languages such as Frama-C [26] and Krakatoa [17] based 
on Why3 [16] and Prusti [4] and VerCors [10] based on Viper [35]. 


Outline. Section 2 explains at a high-level, how our validation approach is struc- 
tured for the different phases. Section 3 introduces a formal semantics for Boogie. 
Sections 4, 5 and 6 present our validation of the CFG-to-DAG, passification, and 
VC phases, respectively. Section 7 evaluates our certificate-producing version of 
Boogie. Section 8 discusses related work. Section 9 concludes. Further details are 
available in our accompanying technical report (hereafter, TR) [37]. 


2 Approach 


A Boogie program consists of a set of procedures, each with a specification and 
a procedure body in the form of a (reducible) control-flow-graph (CFG), whose 
blocks contain basic commands; we present the formal details in the next section. 
Boogie verifies each procedure modularly, desugaring procedure calls according 
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Fig. 1. Key phases of verification in Boogie and their certification. The solid edges show 
Boogie’s transformations on a procedure body; each node G; represents a control-flow- 
graph. Our final certificate (dashed edge) is constructed by formally linking the three 
phase certificates represented by the dotted edges. Each of the three phase certificates 
also incorporate extra smaller transformations that we do not show here. 


to their specifications. Verification is implemented via a series of phases: program- 
to-program translations and a final computation of a VC to be checked by an 
SMT solver. Our goal is to formally certify (per run of Boogie) that validity of 
this VC implies the correctness of the original procedure. 

To keep the complexity of certificates manageable, our technical approach is 
modular in three dimensions: decomposing our formal goal per procedure in the 
Boogie program, per phase of the Boogie verification, and per block in the CFG 
of each procedure. This modularity makes the full automation of our certification 
proofs in Isabelle practical. In the following, we give a high-level overview of this 
modular structure; the details are presented in subsequent sections. 


Procedure Decomposition. Boogie has no notion of a main program or an overall 
program execution. A Boogie program is correct if each of its procedures is 
individually correct (that is, the procedure body has no failing traces, as we 
make precise in the next section). Boogie computes a separate VC for each 
procedure, and we correspondingly validate the verification of each procedure 
separately. 


Phase Decomposition. We break our overall validation efforts down into per- 
phase sub-problems. In this paper, we focus on the following three most substan- 
tial and technically-challenging of these sequential phases, illustrated in Fig. 1. 
(1) The CFG-to-DAG phase translates a (possibly-cyclic) CFG to an acyclic CFG 
(cf. Sect. 4). This phase substantially alters the CFG structure, cutting loops 
using annotated loop invariants to over-approximate their executions. (2) The 
passification phase eliminates imperative updates by transforming the code into 
static single assignment (SSA) form and then replacing assignments with con- 
straints on variable versions (cf. Sect. 5). Both of these phases introduce extra 
non-determinism and assume statements (which, if implemented incorrectly could 
make verification unsound by masking errors in the program). (3) The final VC 
phase translates the acyclic, passified CFG to a verification condition that, in 
addition to capturing the weakest precondition, encodes away Boogie’s polymor- 
phic type system [33]. 

We construct certificates for each of these key phases separately (depicted 
by the blue dotted lines in Fig. 1). For each phase, we certify that if the target 


708 G. Parthasarathy et al. 


of the translation phase is correct (a correct Boogie program for the first two 
phases; a valid VC for the VC phase) then the source (program) of the phase is 
correct. This modular approach lets us focus the proof strategy for each phase 
on its conceptually-relevant concerns, and provides robustness against changes 
to the verifier since at most the certification of the changed phases may need 
adjustment. Logically, our per-phase certificates are finally glued together to 
guarantee the analogous end-to-end property for the entire pipeline, depicted by 
the green dashed edge in Fig. 1. For our certificates, we import the input and 
output programs (and VC) of each key phase from Boogie into Isabelle; we do 
not reimplement any of Boogie’s phases inside Isabelle. 

The certificates of the key phases also incorporate various smaller transfor- 
mations between the key phases, such as peephole optimisation. Our work also 
validates these smaller transformations, but we focus the presentation on the key 
phases in this paper. Boogie also performs several smaller translation steps prior 
to the CFG-to-DAG phase. These include transforming ASTs to corresponding 
CFGs, optimisations such as dead variable elimination, and desugaring proce- 
dure calls using their specifications (via explicit assert, assume, and havoc state- 
ments). Our approach applies analogously to these initial smaller phases, but our 
current implementation certifies only the pipeline of all phases from the (input 
to the) CFG-to-DAG phase onwards. Thus, our certificate relates Boogie’s VC 
to the original source AST program so long as these prior translation steps are 
correct. 


CFG Decomposition. When tackling the certification of each phase, we further 
break down validation of a procedure’s CFG in the source program of the phase 
into sub-problems for each block in the CFG. We prove two results for each block 
in the source CFG: 


1. Local block lemmas: We prove an independent lemma for each source CFG 
block in isolation, relating the executions through the block with the corre- 
sponding block in the target program (or the VC generated for that block, in 
the case of the VC phase). In particular, this lemma implies that if the target 
block has no failing executions (or the VC generated for that block holds, for 
the VC phase), neither does the source block for corresponding input states. 

2. Global block theorems: We show analogous per-block results concerning all 
executions from this block onwards extending to the end of the procedure in 
question; we build these compositionally by reverse-topological traversal of 
either the source or target CFGs, as appropriate. The global block theorem 
for the entry block establishes correctness of the phase. 


This decomposition separates command-level reasoning (local block lemmas) 
from CFG-level reasoning (global block theorems). It enables concise lemmas 
and proofs in Isabelle and makes each comprehensible to a human. 
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3 A Formal Semantics for Boogie 


Our certificates prove that the validity of a VC generated by Boogie formally 
implies correctness of the Boogie CFG-to-DAG source program. This proof relies 
crucially on a formal semantics for Boogie itself. Our first contribution is the first 
such formal semantics for a significant subset of Boogie, mechanised in Isabelle. 
Our semantics uses the Boogie reference manual [29], the presentation of its type 
system [33], and the Boogie implementation for reference; none of those provide 
a formal account of the language. For space reasons, we explain only the key 
concepts of our detailed formalisation here; more details are provided in App. 
A of the TR [37] and the full Isabelle mechanisation is available as part of our 
accompanying artifact [36]. 


3.1 The Boogie Language 


Boogie programs consist of a set of top-level declarations of global variables 
and constants (the global data), axioms, uninterpreted (polymorphic) functions, 
type constructors, and procedures. A procedure declaration includes parameter, 
local-variable, and result-variable declarations (the local data), a pre- and post- 
condition, and a procedure body given as a CFG.’ CFGs are formalised as usual 
in terms of basic blocks (containing a possibly-empty list of basic commands), 
and edges; semantically, execution after a basic block continues via any of its 
successors non-deterministically. 


e ::= x | false | true | i | e1 bop e2 | uop(e) | f[7](é) | otd(e) | 


Vr : T. e | Jx : T. e | Vy t. e | dy. e 


T ::= Int | Bool | C(7)|t c::= assume e | assert e | x := e | havoc x 


Fig. 2. The syntax of our formalised Boogie subset, where 7, e, and c, denote the types, 
expressions, and basic commands respectively; control-flow is handled via CFGs over 
the basic commands. bop and uop denote binary and unary operations, respectively. 


The types, expressions, and basic commands in our Boogie subset are shown 
in Fig. 2. We support the primitive types Int and Bool; types obtained via 
declared type constructors are uninterpreted types; the sets of values such types 
denote are constrained only via Boogie axioms and assume commands. Moreover, 
types can contain type variables (for instance, to specify polymorphic functions). 

Boogie expression syntax is largely standard (e.g. including typical arithmetic 
and boolean operations). Old-expressions old(e) evaluate the expression e w.r.t. 
the current local data and the global data as it was in the pre-state of the 


1 Source-level procedure specifications also include modifies clauses, declaring a set of 
global variables the procedure may modify. As we tackle Boogie programs after pro- 
cedure calls have been desugared, there are no modifies clauses in our formalisation. 
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procedure execution. Boogie expressions also include universal and existential 
value quantification (written Vx : 7. e and Jx : T. e), as well as universal and 
existential type quantification (written V,,t. e and Jẹ t. e). In the latter, t is 
bound in e and quantifies over closed Boogie types (i.e. types that do not contain 
any type variables). 

Basic commands form the single-steps of traces through a Boogie CFG; 
sequential composition is implicit in the list of basic commands in a CFG basic 
block and further control flow (including loops) is prescribed by CFG edges. 
Boogie’s basic commands are assumes, asserts, assignments, and havocs; havoc x 
non-deterministically assigns a value matching the type of variable x to x. 

The main Boogie features not supported by our subset are maps and other 
primitive types such as bitvectors. Boogie maps are polymorphic and impredica- 
tive, i.e. one can define maps that contain themselves in their domain. Giving 
a semantic model for such maps in a proof assistant such as Isabelle or Coq is 
non-trivial; we aim to tackle this issue in the future. Modelling bitvectors will 
be simpler, although maintaining full automation may require some additional 
work. 


3.2 Operational Semantics 


Values and State Model. Our formalisation embeds integer and boolean values 
shallowly as their Isabelle counterparts; an Isabelle carrier type for all abstract 
values (those of uninterpreted types) is a parameter of our formalisation. Each 
uninterpreted type is (indirectly) associated with a non-empty subset of abstract 
values via a type interpretation map T from abstract values to (single) types; 
particular interpretations of uninterpreted types can be obtained via different 
choices of type interpretation T. 

One can understand Boogie programs in terms of the sets of possible traces 
through each procedure body. Traces are (as usual) composed of sequences of 
steps according to the semantics of basic commands and paths through the CFG; 
these can be finite or infinite (representing a non-terminating execution). A trace 
may halt in three cases: (1) an exit block of the procedure is reached in a state 
satisfying the procedure’s postcondition (a complete trace),? (2) an assert A 
command is reached in a state not satisfying assertion A (a failing trace), or 
(3) an assume A command is reached in a state not satisfying A (a trace which 
goes to magic and stops). Our formalisation correspondingly includes three kinds 
of Boogie program states: a distinguished failure state F, a distinguished magic 
state M, and normal states N((os, gs,ls)). A normal state is a triple of partial 
mappings from variables to values for the old global state (for the evaluation of 
old-expressions), the (current) global state, and the local state, respectively. 


Expression Evaluation. An expression e evaluates to value v if the (big-step) 
judgement T,A,I, 2+ (e,N(ns)) 4} v holds in the context (T, A, T, 2). Here, T 


? The case of the postcondition not holding is subsumed under point (2), since Boogie 
checks postconditions by generating extra assert statements. 
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assume i != 0 assume i != 0 
, : Bo 
j :=0 j :=0 
while(i != 0) | 
inv j >= 0^ (i=0>j > 0) assert j >= 0 A^ (i =0>j>0) |B, 
{ x. is 
if(i < 5) { =i assume i = 0 B 
j := j+1 assume assert j > 0| 7° 
} 
i := i-1 assume i S 
} j assume !(i < 5) | By 


eee ee 


Fig. 3. Running example in source code and CFG representation, respectively. 


is a type interpretation (as above), A is a variable contezt: a pair (G, L) of type 
declarations for the global (G) and local (L) data. I is a function interpretation, 
which maps each function name to a semantic function mapping a list of types 
and a list of values to a return value. The type substitution 2 maps type variables 
to types. 

The rules defining this judgement can be found in App. A.2 of the TR [37]. 
For example, the following rule expresses when a universal type quantification 
evaluates to true (t is bound to the quantified type and may occur in e): 


Vr. closed(r) => T, A, T, Q(t T) F (e, ns) |) true 
T,A,L,Q+ (Vit. e,ns) 4) true 


The premise requires one to show that the expression e reduces to true for every 
possible type 7 that is closed. In general, expression evaluation is possible only 
for well-typed expressions; we also formalise Boogie’s type system and (for the 
first time) prove its type safety for expressions in Isabelle. 


Command and CFG Reduction. The (big-step) judgement T, A, T, 2+ (c,s) > 
s’ defines when a command c reduces in state s to state s’; the rules are in 
App. A.3 of the TR [37]. This reduction is lifted to lists of commands cs to 
model the semantics of a single trace through a CFG block (the judgement 
T,A,P,2+ (cs, s} [>] s’). The operational semantics of CFGs is modelled by 
the (small-step) judgement T, A,I,2,G+ 6 -cre 6’, expressing that the CFG 
configuration 6 reduces to configuration 6’ in the CFG G. A CFG configuration 
is either active or final. An active configuration is given by a tuple (inl(b,), s), 
where bn is the block identifier indicating the current position of the execution 
and s is the current state. A final configuration consists of a tuple (inr(()), s) for 
state s (and unit value ()) and is reached at the end of a block that has either 
no successors, or is in a magic or failure state. 
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assume i != 0 B assume i != 0 
j i= 0 o j:=0 Bo 
| assert A 
assert A | Bı | 
havoc i,j 
a Ss m ‘i Bi 
: assume i = 0 assume 
assume i != 0 | Bo 7 Be ea Z 
assert j > 0 j assume i = 0| _, 
assumei != 0 | B3 : Be 
assume i < 5 assert j> 0 
f , B: 
j := j+1 3 [assume !(i < 5) | Ba assume i < 5 BI 
j := j+1 3 [assume !(i < 5) | B, 
i:=i-l Bs 
i := i-1 
assert A B; 
assume false 


Fig. 4. The CFG-to-DAG phase applied to the running example (source is left, target 
is right). The back-edge (the red edge from Bs to Bı in the left CFG) is eliminated. 
The blue commands are new. A is given by j >= 0A (i = 0> j > 0). 


3.3 Correctness 


A procedure is correct if it has no failing traces. This is a partial correctness 
semantics; a procedure body whose traces never leave a loop is trivially cor- 
rect provided that no intermediate assert commands fail. Procedure correctness 
relies on CFG correctness. A CFG G is correct w.r.t. a postcondition Q and a 
context (TJ, A, I, 2) in an initial normal state N(ns) if the following holds for all 
configurations (r, s’): 


T, A, T, 2,G + (inl(entry(G)), N(ns)) >črg (r,s) => [S AFA 
(r = inr(()) => (Vns’. s’ = N(ns') => T, A, T, Q H (Q,N(ns’)) |) true))] 


where entry(G) is the entry block of G and —¢r¢ is the reflexive-transitive closure 
of the CFG reduction. The postcondition is needed only if a final configuration 
is reached in a normal state, while failing states must be unreachable. Whenever 
we omit Q, we implicitly mean the postcondition to be simply true. In our tool, 
we consider only empty initial mappings 2, since we do not support procedure 
type parameters (lifting our work to this feature will be straightforward). 

For a procedure p to be correct w.r.t. a context, its body CFG must be correct 
w.r.t. the same context and p’s postcondition, for all initial normal states N(ns) 
that satisfy p’s precondition and which respect the context. For ns to respect a 
context, it must be well-typed and must satisfy the axioms when restricted to its 
constants. We say that p is correct, if it is correct w.r.t. all well-formed contezts, 
which must have a well-typed function interpretation and a type interpretation 
that inhabits every uninterpreted closed type (and only those). 


Running Example. We will use the simple CFG of Fig. 3 as a running example, 
intended as body of a procedure with trivial (true) pre- and post-conditions. 
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The code includes a simple loop with a declared loop invariant, which functions 
as a Classical Floyd/Hoare-style inductive invariant, and for the moment can 
be considered as an implicit assert statement at the loop head. The CFG has 
infinite traces: those which start from any state in which i is negative. Traces 
starting from a state in which i is zero go to magic; they do not reach the loop. 
The program is correct (has no failing traces): all other initial states will result 
in traces that satisfy the loop invariant and the final assert statement. If we 
removed the initial assume statement, however, there would be failing traces: the 
loop invariant check would fail if i were initially zero. 


4 The CFG-to-DAG Phase 


In this section, we present the validation for the CFG-to-DAG phase in the 
Boogie verifier. This phase is challenging as it changes the CFG structure, inserts 
additional non-deterministic assignments and assume statements, and must do 
so correctly for arbitrary (reducible) nested loop structures, which can include 
unstructured control flow (e.g. jumps out of loops). 


4.1 CFG-to-DAG Phase Overview 


The CFG-to-DAG phase applies to every loop head block identified by Boo- 
gie’s implementation and any back-edges from a block reachable from the loop 
head block back to the loop head (following standard definitions for reducible 
CFGs [21]). Figure 4 illustrates the phase’s effect on our running example. Block 
Bı is the (only) loop head here, and the edge from Bs to it the only back-edge 
(completing looping paths via Bə and B3 or Bz and B4). An assert A state- 
ment starting a loop head (like B,) is interpreted as declaring A to be the loop 
invariant.? The CFG-to-DAG phase performs the following steps: 


1. Accumulate a set Xy of all (local and global) variables assigned-to on any 
looping path from the loop head back to itself. In our example, Xy is {i,j}. 
2. Move the assert A statement declaring a loop invariant (if any) from the 
loop head to the end of each preceding block (in our example: Bo and Bs). 
3. Insert havoc statements at the start of the loop head block per variable in Xy, 
followed by a single assume A statement (preceding any further statements). 
4. For each block with a back-edge to the loop head, delete the back-edge; if this 
leaves the block with no successors, append assume false to its commands.* 


The havoc-then-assume sequence introduced in step 3 can be understood as 
generating traces for arbitrary values of Xy satisfying the loop invariant A, 


3 In general, multiple asserts at the beginning of a loop head may form the invariant. 
* Omitting assume false if there are no successors would be incomplete, since otherwise 
the postcondition would have to be satisfied. 
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effectively over-approximating the set of states reachable at the loop head in the 
original program. In particular, the remnants of any originally looping path (e.g. 
Bi, BS, BS, Bs) enforce that any non-failing trace starting from any such state 
must (due to the assert added to block B% in step 2) result in a state which 
re-establishes the loop invariant. Such paths exist only to enforce this inductive 
step (analogously to the premise of a Hoare logic while rule); so long as the 
assert succeeds, we can discard these traces via step 4. 

While we illustrate this step on a simple CFG, in general a loop head may 
have multiple back-edges, looping structures may nest, and edges may exit multi- 
ple loops. For the above translation to be correct, the CFG must be reducible and 
loop heads and corresponding back-edges identified accurately, which is complex 
in general. Importantly (but perhaps surprisingly), our work makes this phase 
of Boogie certifying without explicitly verifying (or even defining) these notions. 


4.2 CFG-to-DAG Certification: Local Block Lemmas 


We define first our local block lemmas for this phase. Recall that these prove 
that if executing the statements of a target block yields no failing executions, 
the same holds for the corresponding source block; this result is trivial for source 
blocks other than loop heads and their immediate predecessors, since these are 
unchanged in this phase. To enable eventual composition of our block lemmas, 
we need to also reflect the role of the assume and assert statements employed 
in this phase. The formal statement of our local block lemmas is as follows’: 


Theorem 1 (CFG-to-DAG Local Block Lemma). Let B be a source block 
with commands css, whose corresponding target block has commands csr. If B is 
a loop head, let Xy be as defined in CFG-to-DAG step 1 (and empty otherwise) 
and let Apre be its loop invariant (or true otherwise). If B is a predecessor of a 
loop head, let Apost be the loop invariant of its successor (and true otherwise). 


Then, if: 


1. T,A,T, QE (css, N(ns1)) [>] si 

2. Ysh. T, A, T, Q F (esr, N(ns2)) [>] s4 => sh AF 

3. Apre is satisfied in nsı, and nsz differs from nsı only on variables in Xy and 
variables not defined in A 


then: s Æ F and if s| is a normal state, then (1) Apost is satisfied in s}, and (2) 
if no assume false was added at the end of csr, then there is a target execution 
in csr from N(ns2) that reaches a normal state that differs from si only on 
variables not defined in A. 


The gist of this lemma is to capture locally the ideas behind the four steps of 
the phase. For example, consequence (1) reflects that after the transformation, 
any blocks that were previously predecessors of a loop head (Bj and Bs in our 
running example) will have an assert statement checking for the corresponding 
invariant (and so if the target program has no failing traces, in each trace this 
invariant will be true at that point). 


5 We omit some details regarding well-typedness, handled fully in our formalisation. 
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assume i != 0 | BS assume il != 0 | BY 


ea ee Pa 
assume il < 5 


Saun : <5 Bi, [assume | (i < 5) B4 |assume j3 = j2+1 By [assume ! (il < 5) 
a assume j4 = j3 assume j4 = j2 


= 2 < 


assume i2 = il-1 
assert j >= 0A (i = 0>j > 0)| BS 
J ( j > 0)| Bs assert j4 >= 0 A (i2 = 0 => j4 > 0)| BY 
assume false 


assume false 


Bi 


Fig. 5. The passification phase applied to the branch in the running example with the 
result on the right. The final (green) commands in BY and BY are the synchronisation 
commands. At the uppermost blocks shown here, the current versions of i and j are 
il and j2, respectively. The full CFGs are shown in App. B of the TR [37]. 


4.3 CFG-to-DAG Certification: Global Block Theorems 


We lift our certification to all traces through the source and target CFGs; the 
statement of the corresponding global block theorems is similar to that of local 
block theorems lifted to CFG executions, and for space reasons we do not present 
it here, but it is included in our Isabelle formalisation. In particular, we prove 
for each block (working in reverse topological order through the target CFG 
blocks) that if executions starting in the target CFG block never fail, neither do 
any executions starting from the corresponding source CFG block, and looping 
paths modify at most the variables havoced according to step 3 of the phase. 

The major challenge in these proofs is reasoning about looping paths in 
the source CFG, since these revisit blocks. To solve this challenge, we perform 
inductive arguments per loop head in terms of the number of steps remaining in 
the trace in question. Our global block theorem for a block B then carries as 
an assumption an induction hypothesis for each loop that contains B. Proving 
a global block theorem for the origin of a back-edge is taken care of by applying 
the corresponding induction hypothesis. 

This proof strategy works only if we have obtained the induction hypothesis 
for the loop head before we use the global block theorem of the origin of a 
back-edge (otherwise we cannot discharge the block theorem’s hypothesis). In 
other words, our proof implicitly shows the necessary requirement that loop 
heads (as identified by Boogie) dominate all back-edges reaching them without us 
formalising any notion of domination, CFG reducibility, or any other advanced 
graph-theoretic concept. This shows a major benefit of our validation approach 
over a once-and-for-all verification of Boogie itself: our proofs indirectly check 
that the identification of loop heads and back-edges guarantees the necessary 
semantic properties without being concerned with how Boogie’s implementation 
computes this information. 


6 This may seem insufficient since traces can be infinite, but importantly a failing 
trace is always finite, and our theorems need only eliminate the chance of failing 
traces. 
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Our approach applies equally to nested loops and more-generally to reducible 
CFG structures; all corresponding induction hypotheses are carried through 
from the visited loop heads. The requirement that no more than the havoced 
variables Xp are modified in the source program is easily handled by showing 
that variables modified in an inner loop are a subset of those in outer loops. 
As for all of our results, our global block lemmas are proven automatically in 
Isabelle per Boogie procedure, providing per-run certificates for this phase. 


5 The Passification Phase 


In this section, we describe the validation of the passification phase in the Boo- 
gie verifier. Unlike the previous phase, passification makes no changes to the 
CFG structure, but makes substantial changes to the program states (via SSA- 
like renamings), substantially increases non-determinism, and employs assume 
statements to re-tame the sets of possible traces. 


5.1 Passification Phase Overview 


The main goal of passification is to eliminate assignments such that a more effi- 
cient VC can be ultimately generated [6,18,30]. In the Boogie verifier, this is 
implemented as a single transformation phase that can be thought of as two 
independent steps. Firstly, the source CFG is transformed into static single 
assignment (SSA) form, introducing versions (fresh variables) for each origi- 
nal program variable such that each version is assigned at most once in any 
program trace. In a second step, variable assignments are completely eliminated: 
each assignment command x := e is replaced by assume x = e. Havoc statements 
are simply removed; their effect is implicit in the fact that a new variable version 
is used (via the SSA step) after such a statement. 

Figure 5 shows the effect of this phase on four blocks of our running example 
(the full figure of the target CFG is shown in App. B of the TR [37]). The 
commands inserted just before the join block (here, Bf) introduce a consistent 
variable version (here, j4) for use in the join block. It is convenient to speak of 
target variables in terms of their source program counterparts: we say e.g. that 
j has version 4 on entry to block B5. 

Compared to traces through the source program, the space of variable values 
in a trace through the target program is initially much larger; each version may, 
on entry to the CFG, have an arbitrary value. For example, j4 may have any 
value on entry to BY; traces in which its value does not correspond to the con- 
straint of the assume statements in BY or BY will go to magic and not reach BY. 
Importantly, however, not all traces go to magic; enough are preserved to simu- 
late the executions of the original program: each assume statement constrains the 
value of exactly one variable version, and the same version is never constrained 
more than once. Capturing this delicate argument formally is the main challenge 
in certifying this step. 
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As extra parts of the passification phase, the Boogie verifier performs constant 
propagation and desugars old-expressions (using variable versions appropriate to 
the entry point of the CFG). We omit their descriptions here for brevity, but our 
implementation certifies them. 


5.2 Passification Certification: Local Block Lemmas 


To validate the passification phase, it is sufficient to show that each source execu- 
tion is simulated by a corresponding target execution, made precise by construct- 
ing a relation between the states in these executions. Such forward simulation 
arguments are standard for proving correctness of compilers for deterministic 
languages. However, the situation here is more complex due to the fact that 
the target CFG has a much wider space of traces: the values of each versioned 
variable in the target program are initially unconstrained, meaning traces exist 
for all of their combinations. On the other hand, many of these traces do not 
survive the assume statements encountered in the target program. Picking the 
correct single trace or state to simulate a particular source execution would 
require knowledge of all variable assignments that are going to happen, which 
is not possible due to non-determinism and would preclude the block-modular 
proof strategies that our validation approach employs. 

Instead, we generalise this idea to relating each single source state s with a 
set T of corresponding target program states. We define variable relations Vr at 
each point in a trace, making explicit the mappings used in the SSA step between 
source program variables and their corresponding versions. For example, on entry 
to block B4 in the source version of our running example (correspondingly BY 
in the target), the Vz relation relates i to il and j to j2. All states t € T must 
precisely agree with s w.r.t. Vr (e.g., s(i) = (11), s(j) = t(j2)). On the other 
hand, our sets of states T are defined to be completely unconstrained (besides 
typing) for future variable versions. For example, for every t € T at the same 
point in our example, there will be states in T assigning each possible value (of 
the same type) to i2 (and otherwise agreeing with t). 

More precisely, for a set of variables X, we say that a set of states T constrains 
at most X w.r.t. variable context A if, for every t € T, z ¢ X, z is in A, and value 
v of z’s type, we have t[z + v] € T. In other words, the set T is closed under 
arbitrary changes to values of all variables in A but not in X. We construct our 
sets T such that they constrain at most current and past versions of program 
variables. It is this fact that enables us to handle subsequent assume statements 
in the target program and, in particular, to show that the set of possible traces 
in the target program never becomes empty while there are possible traces in 
the source program. For example, when relating the source command j := j+1 
in Bs with the target command assume j3 = j2 + 1 in block BY, we use the fact 
that our set of states does not constrain j3 to prove that, although many traces 
go to magic at this point, for a non-empty set of states T’ C T (those in which 
j3 has the “right” value equal to j2 + 1), execution continues in the target. 
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We now make these notions more precise by showing the definition of our 
local block lemmas for the passification phase (See footnote 5). 


Theorem 2 (Passification Local Block Lemma). Let B be a source block 
with commands cs, whose corresponding target block has commands cs’; let Vp 
and Vp be the variable relations at the beginning and end of B, respectively. Let 
X be a set of variable versions, and N(ns) be a normal state. Let T be a non- 
empty set of normal states such that N(ns) agrees with T according to Vr, and 
T constrains at most X w.r.t. Ag. Furthermore, let Y be the variable versions 
corresponding to the targets of assignment and havoc statements in cs. If both 


1. A, Ai, 1,25 (cs, N(ns)) [>] s A s AM 
2. XAY =O 


then there exists a non-empty set of normal states T' C T s.t. T’ constrains at 
most X WY w.r.t. Ag and for each t' € T', there exists a state t™* s.t. 


1. A, Ao, T, Q F (cs2, t) [>] t* A (9 = F= t* = F) 
2. If s' is a normal state, then s' and t are related w.r.t. Vp (and t™* = t'). 


This lemma captures our generalised notion of forward simulation appropriately. 
The first conclusion expresses that the target does not get stuck and that failures 
are preserved, while the second shows that if the source execution neither fails nor 
stops then the resulting states are related. Note that premise 2 is essential in the 
proof to guarantee that the assume statements introduced by passification do not 
eliminate the chance to simulate source executions; the condition expresses that 
the variable versions newly constrained do not intersect with those previously 
constrained. To prove these lemmas over the commands in a single block, we are 
forced to check that the same version is not constrained twice. 


5.3 Passification Certification: Global Block Theorems 


As for all phases, we lift our local block lemmas to theorems certifying all exe- 
cutions starting from a particular block, and thus, ultimately, to entire CFGs. 
For the passification phase, most of the conceptual challenges are analogous 
to those of the local block lemmas; we similarly employ Vz relations between 
source variables and their corresponding target versions. To connect with our 
local block lemmas (and build up our global block theorems, which we do back- 
wards through the CFG structure), we repeatedly require the key property that 
the set of variable versions constrained in our executions so far is disjoint from 
those which may be constrained by a subsequent assume statement (cf. premise 2 
of our local block lemma above). Concretely tracking and checking disjointness 
of these concrete sets of variables is simple, but turns out to get expensive in 
Isabelle when the sets are large. 

We circumvent this issue with our own global versioning scheme (as opposed 
to the versions used by Boogie, which are independent for different source vari- 
ables): according to the CFG structure, we assign a global version number verg (x) 
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to each variable x in the target program such that, if x is constrained in a target 
block B’ and y is constrained in another target block B” reachable from B’, 
then verg(x) < verg(y). Such a consistent global versioning always exists in the 
target programs generated by Boogie because the only variables not constrained 
exactly once in the program are those used to synchronise executions (i.e. j4 
in Fig. 5), which always appear right before branches are merged. We can now 
encode our disjointness properties much more cheaply: we simply compare the 
maximal global version of all already-constrained variables with the minimal 
global version of those (potentially) to be constrained. Since we represent vari- 
ables as integers in the mechanisation, we directly use our global version as the 
variable name for the target program; there is no need for an extra lookup table. 
Note that (readability aside) it makes no difference which variables names are 
used in intermediate CFGs; we ultimately care only about validating the original 
CFG. 


6 The VC Phase 


In this section, we present the validation of the VC phase in the Boogie verifier. 
This phase has two main aspects: (1) it encodes and desugars all aspects of the 
Boogie type system, employing additional uninterpreted functions and axioms to 
express its properties [33]; program expression elements such as Boogie functions 
are analogously desugared in terms of these additional uninterpreted functions, 
creating a non-trivial logical gap between expressions as represented in the VC 
and those from the input program. (2) It performs an efficient (block-by-block) 
calculation of a weakest precondition for the (acyclic, passified) CFG, resulting 
in a formula characterising its verification requirements, subject to background 
axioms and other hypotheses. 


6.1 VC Structure 


The generated VC has the following overall structure (represented as a shallow 
embedding in our certificates)’: 


VY VC quantifiers . ( VC assumptions => CFG WP) 
SS aM —>——__ ama 
type encoding parameters, type encoding, 
functions, variable values func./var./prog. axioms 


The VC quantifies over parameters required for the type encoding, as well as 
VC counterparts representing the variable values and functions in the Boogie 
program. The VC body is an implication, whose premise contains: (1) assump- 
tions that axiomatise the type encoding parameters, (2) axioms expressing the 
typing of Boogie variables and functions, and (3) assumptions directly relating 


T Note that top-level quantification over functions is implicit in the (first-order) SMT 
problem generated by Boogie; we quantify explicitly in our Isabelle representation. 
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to axioms explicitly declared in the Boogie program. The conclusion of the impli- 
cation is an optimised version of the weakest (liberal) precondition (WP) of the 
CFG.’ 


6.2 Boogie’s Logical Encoding of the Boogie Type System 


We first briefly explain Boogie’s logical encoding of its own type system. Values 
and types are represented at the VC level by two uninterpreted carrier sorts 
V and T. An uninterpreted function typ from V to T maps each value to the 
representation of its type. Boogie type constructors are each modelled with an 
(injective) uninterpreted function C with return sort T and taking arguments 
(per constructor parameter) of sort T. For example, a type constructor List(t) 
is represented by a VC function from T to T. Projection functions are also 
generated for each type constructor (C7 for each type argument at position i), 
e.g. mapping the representation of a type List(t) to the representation of type t. 

This encoding is then used in the VC to recover Boogie typing constraints for 
the untyped VC terms. Recovering the constraints is not always straightforward 
due to optimisations performed by Boogie. For example, the VC translation 
of the Boogie expression V t. Vz : List(t). e no longer quantifies over types; 
all original occurrences of t in e having been translated to List{(typ(x)). This 
optimisation reflects that this particular type quantification is redundant, since 
t can be recovered from the type of z.° 


6.3 Working from VC Validity 


Our certificates assume that the generated VC is valid (certifying the validity- 
checking of the VC by an SMT solver is an orthogonal concern). However, con- 
necting VC validity back to block-level properties about the specific program 
requires a number of technical steps. We need to construct Isabelle-level seman- 
tic values to instantiate the top-level quantifiers in the VC such that the corre- 
sponding VC assumptions (left-hand side of the VC) can be proved and, thus, 
validity of the corresponding WP can be deduced. Moreover, we must ensure 
that our instantiation yields a WP whose validity implies correctness of the Boo- 
gie program. For example, a top-level VC quantifier modelling a Boogie function 
f must be instantiated with a mathematical function that behaves in the same 
way as f for arguments of the correct type. 

We instantiate the carrier sort V for values in the VC with the corresponding 
type denoting Boogie values in our formalisation; the carrier sort T for types 
is instantiated to be all Boogie types that do not contain free variables (i.e. 
closed types). Constructing explicit models for the quantified functions used to 


8 One difference in our version of the Boogie verifier is that we switched off the gen- 
eration of extra variables introduced to report error traces [32]; these are redundant 
for programs that do not fail and further complicate the VC structure. 

? Note that in the VC the quantification over x ranges over all values of sort V. An 
implication is used to consider only those x for which typ(x) = List( Listi (typ(«))). 
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model Boogie’s type system (satisfying, e.g., suitable inverse properties for the 
projection functions) is straightforward. For the VC-level variable values, we can 
directly instantiate the corresponding values in the initial Boogie program state. 

VC-level functions representing those declared in the Boogie program are 
instantiated as (total) functions which, for input values of appropriate type (the 
arguments and output are untyped values of sort V), are defined simply to return 
the same values as the corresponding function in our model. However, perhaps 
surprisingly, Boogie’s VC embedding of functions logically requires functions to 
return values of the specified return type even if the input values do not have the 
types specified by the function. In such cases, we define the instantiated function 
to return some value of the specified type, which is possible since in well-formed 
contexts every closed type has at least one value in our model. 

After our instantiation, we need to prove the hypotheses of the VC’s impli- 
cation; in particular that all axioms (both those generated by the type system 
encoding and those coming from the program itself) are satisfied. The former 
are standard and simple to prove (given the work above), while the latter largely 
follow from the assumption that each declared axiom must be satisfied in the 
initial state restricted to the constants. The only remaining challenge is to relate 
VC expressions with the evaluation of corresponding Boogie expressions; an issue 
which also arises (and is explained) below, where we show how to connect validity 
of the instantiated WP to the program. 


6.4 Certifying the VC Phase 


Boogie’s weakest precondition calculation is made size-efficient by the usage 
of explicit named constants for the weakest preconditions wp( B, true) for each 
block B, which is defined in terms of the named constants for its successor blocks. 
For example, in Fig. 5, wp(BY, true) is given by i} 4 0 => wp(BY,true) A 
wp(BY, true). Here 77° is the value that we instantiated for the variable il. 

We exploit this modular construction of the generated weakest precondition 
for the local and global block theorems. We prove for each block B with com- 
mands cs the following local block lemma: 


Theorem 3 (VC Phase Local Block Lemma). 
If A,A, T, Q F (cs, N(ns)) [>] s and wp(B, true) holds, then s' # F and if s’ is 
a normal state, then YBsuc € successors(B). wp(Bsuc, true). 


Once one has proved this lemma for all blocks in the CFG, combining them 
to obtain the corresponding global block theorems (via our usual reverse walk 
of the CFG) is straightforward. The main challenge is in decomposing the proof 
for the local block lemma itself for a block B, for which we outline our approach 
next. 

By this phase, the first command in B must be either an assume e or an 
assert e command. In the former case, we rewrite wp(B,true) into the form 
e”? = > H, where e”° is the VC counterpart of e and where H corresponds 
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to the weakest precondition of the remaining commands. This rewriting may 
involve undoing certain optimisations Boogie’s implementation performed on the 
formula structure. Next, we need to prove that e evaluates to e”° (see below). 
Hence, if e evaluates to true (the execution does not go to magic) then H must 
be true, and we can continue inductively. The argument for assert e is similar 
but where we rewrite the VC to e’° A H (i.e. e”° and H must both hold); if e 
evaluates to e”°, we know that the execution does not fail. 

Proving that e evaluates to e”° arises in both cases and also in our previous 
discharging of VC hypotheses. Note that, in contrast to e, e” is not a Boogie 
expression, but a shallowly embedded formula that includes the instantiations of 
quantified variables we constructed above. Showing this property works largely 
on syntax-driven rules that relate a Boogie expression with its VC counterpart, 
except for extra work due to mismatching function signatures and optimisations 
that Boogie made either to the formula structure or via the type system encoding 
(cf. Sect. 6.2). We handle some of these cases by showing that we can rewrite 
the formula back into the unoptimised standard form we require for our syntax- 
driven rules and in other cases we directly work with the optimised form. Both 
cases are automated using Isabelle tactics. 

This concludes our discussion of the certification of Boogie’s three key phases. 
Combining the three certificates yields an end-to-end proof that the validity of 
the generated verification conditions implies the correctness of the input program, 
that is, that the given verification run is sound. 


7 Implementation and Evaluation 


In this section, we evaluate our certifying version of the Boogie verifier [36], 
which produces Isabelle certificates proving the correctness of Boogie’s pipeline 
for programs it verifies. 

We have implemented our validation tool as a new C# module compiled with 
Boogie. We instrumented Boogie’s codebase to call out to our module, which 
allows us to obtain information that we can use to validate the key phases, and 
extended parts of the codebase to extract information more easily. Moreover, we 
disabled counter-example related VC features and the generation of VC axioms 
for any built-in types and operators that we do not support. We added or changed 
fewer than 250 non-empty, uncommented lines of code across 11 files in the 
existing Boogie implementation. 

Given an input file verified by Boogie, our work produces an Isabelle certifi- 
cate per procedure p that certifies the correctness of the corresponding CFG-to- 
DAG source CFG as represented internally in Boogie. The generation and check- 
ing of the certificate is fully automatic, without any user input. We use a combi- 
nation of custom and built-in Isabelle tactics. In addition to the three key phases 
we describe in detail, our implementation also handles several smaller transforma- 
tions made by Boogie, such as constant propagation. Our tool currently supports 
the default options of Boogie (only) and does not support advanced source-level 
attributes (for instance, to selectively force procedures to be inlined). 
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Table 1. Selection of algorithmic examples with the lines of code (LOC), the number 
of procedures (#P), the time it takes for Isabelle to check the certficate in seconds (the 
average of 5 runs on a Lenovo T480 with 32 GB, i7-8550U 1.8 GhZ, Ubuntu 18.04 on 
the Windows Subsystem for Linux), and the certificate size expressed as the number 
of non-empty lines of Isabelle. 


Name LOC | #P | Time [s] | Size 
TuringFactorial 29 1 19.4 | 1986 
Find 27 2 27.3 | 2100 
DivMod 69 | 2 28.4 | 4753 
Summax [27] 23 | 1 19.1 | 1953 
MaxOfArray [12] 22 | 1 | 19.9 |1944 
SumOfArray [12] 22 | 1 18.7 | 1534 
Plateau [12] 50 | 1 22.9 | 2019 
WelfareCrook [12] 52 | 1 39.4 | 2528 
ArrayPartitioning [12]| 57 | 2 27.6 | 3514 
DutchFlag [12] 76 | 2 52.8 | 3994 


We evaluated our work in two ways. Firstly, to evaluate the applicability 
of our certificate generation, we automatically collected all input files with at 
least one procedure from Boogie’s test suite [1] which verify successfully and 
which either use no unsupported features or are easily desugared (by hand) into 
versions without them. This includes programs with procedure calls since Boogie 
simply desugars these in an early stage. For programs employing attributes, we 
checked whether the program still verifies without attributes, and if so we also 
kept these. In total, this yields 100 programs from Boogie’s test suite. Secondly, 
we collected a corpus of ten Boogie programs which verify interesting algorithms 
with non-trivial specifications: three from Boogie’s test suite and seven from the 
literature [12,27]. Where needed we manually desugared usages of Boogie maps 
(which we do not yet support) using type declarations, functions, and axioms. 

Of the 100 programs from Boogie’s test suite, we successfully generate cer- 
tificates in 96 cases. The remaining 4 cases involve special cases that we do not 
handle yet. For 2 of them, extending our work is straightforward: one special 
case includes a naming clash and the other case can be amended by using a more 
specific version of a helper lemma. The remaining two fail because of our incom- 
plete handling of function calls in the VC phase when combined with coercions 
between VC integers or booleans and their Boogie counterparts. Handling this 
is more challenging but is not a fundamental issue. 

For the corpus of 10 examples, Table 1 shows the generated certificate size 
and the time for Isabelle to check their validity.!° The ratio of certificate size to 
code size ranges from 41 to 89; this rather large ratio emphasises the substantial 
work in formally validating the substantial work which Boogie’s implementation 


10 The time to generate the certificate is not included, but is negligible here. 
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performs. Optimisations to further reduce the ratio are possible. The validation 
of certificates takes usually under one second per line of code. While these times 
are not short, they are acceptable since certificate generation needs to run only 
for (verified) release versions of the program in question. 


8 Related Work 


Several works explore the validation of program verifiers. Garchery et al. [20] 
validate VC rewritings in the Why3 VC generator [16]. Unlike our work, they do 
not connect VCs with programs and do not handle the erasure of polymorphic 
types. Strub et al. [39] validate part of a previous version of the F* verifier [40] 
by generating a certificate for the F* type checker itself, which type checks 
programs by generating VCs. Like us, they assume the validity of the generated 
VC itself, but they do not consider program-to-program transformations such 
as ours. Another approach is taken by Aguirre [2] who shows how one can map 
proofs of the VC back to correctness of an F* program. They prove a once-and- 
for-all result, but the approach could be lifted to a validation approach using 
the proof-producing capability of SMT solvers [7]. Lifting the approach would 
require extending the work to handle classical instead of constructive VC proofs. 

There is some work on proving VC generator implementations correct once 
and for all, although none of the proven tools are used in practice. Homeier and 
Martin [23] prove a VC generator correct in HOL for an executable language 
and a simpler VC phase than Boogie’s. Herms et al. [22] prove a VC genera- 
tor inspired by Why3 correct in Coq. However, some more-challenging aspects 
of Why3’s VC transformation and polymorphic type system are not handled. 
Vogels et al. [44] prove a toolchain for a Boogie-like language correct in Coq, 
including passification and VC phases. However, the language is quite limited: 
without unstructured control flow, loops (i.e. no need for a CFG-to-DAG phase), 
functions, or polymorphism (i.e. no type encoding). Verifiers other than VC 
generators, include the verified Verasco static analyzer [25], which supports a 
realistic subset of C, but whose performance is not yet on par with unverified, 
industrial analyzers. 

Validation has also been explored in other settings. Alkassar et al. [3] adjust 
graph algorithms to produce witnesses that can be then used by verified valida- 
tors to check whether the result is correct. In the context of compiler correctness, 
many validation techniques express a per-run validator in Coq, prove it correct 
once-and-for-all [8, 41,43], and then extract executable code (the extraction must 
be trusted). In the verified CompCert compiler [34], such validators have been 
used in combination with the once-and-for-all approach. Validators are used for 
phases that can be more easily validated than proved correct once and for all. 
One such example related to our certification of the passification phase is the 
validation of the SSA phase [8], dealing also with versioned variables in the tar- 
get (but not with assume statements that prune executions). In contrast to our 
work, they require an explicit notion of CFG domination and they do not use a 
global versioning scheme to efficiently check that two parts of the CFG constrain 
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disjoint versions. Our versioning idea is similar to a technique used for the valida- 
tion of a dominator relation in a CFG [9], which assigns intervals to basic blocks 
(as opposed to assigning versions to variables) to efficiently determine whether a 
block dominates another one. The validation of the Cogent compiler [38] follows 
a similar approach to ours in that it generates proofs in Isabelle. 


9 Conclusion 


We have presented a novel verifier validation approach, and applied it successfully 
to three key phases of the Boogie verifier, providing formal underpinnings for 
both the language and its verifier for the first time. Our work demonstrates that 
it is feasible to provide strong formal guarantees regarding the verification results 
of practical VC generators written in modern mainstream languages. 

In the future, we plan to extend our supported subset of Boogie, e.g. 
to include procedure calls and bitvectors. Supporting Boogie’s potentially- 
impredicative maps is the main open challenge: maps can take other maps as 
input, potentially including themselves. The challenge with this feature is to 
still be able to express a type in Isabelle capturing all Boogie values despite the 
potentially-cyclic nature of map types. In practice, however, this may not be 
required in full generality: we have observed that Boogie front-ends rarely use 
maps that contain maps of the same type as input. Therefore, we plan to extend 
our technique to support a suitably-expressive restricted form of Boogie maps. 
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Abstract. Verification of instruction encoders and decoders is essential 
for formalizing manipulation of machine code. The existing approaches 
cannot guarantee the critical consistency property, i.e., that an encoder 
and its corresponding decoder are mutual inverses of each other. We 
observe that consistent encoder-decoder pairs can be automatically 
derived from bijections inherently embedded in instruction formats. 
Based on this observation, we develop a framework for writing specifica- 
tions that capture these bijections, for automatically generating encoders 
and decoders from these specifications, and for formally validating the 
consistency and soundness of the generated encoders and decoders by 
synthesizing proofs in Coq and discharging verification conditions using 
SMT solvers. We apply this framework to a subset of X86-32 instructions 
to illustrate its effectiveness in these regards. We also demonstrate that 
the generated encoders and decoders have reasonable performance. 


Keywords: Formalized instruction formats - Verified parsing - 
Program synthesis - Proof synthesis - Translation validation 


1 Introduction 


Software that manipulates machine code such as compilers, OS kernels and 
binary analysis tools, relies on instruction encoders and decoders for extract- 
ing structural information of instructions from machine code and for translating 
such information back into binary forms. Because of the sheer amount of instruc- 
tions provided by any instruction set architecture (ISA) and the complexity of 
instruction formats, it is extremely tedious and error-prone to implement instruc- 
tion encoders and decoders by hand. Therefore, the literature contains abundant 
work on automatic generation of instruction encoders and decoders, often from 
specifications written in a formal language capable of concisely and accurately 
characterizing instruction formats on various ISAs [7, 12,15]. 

Unfortunately, the above approaches generate little formal guarantee, there- 
fore not suitable for rigorous analysis or verification of machine code. In those 
settings, instruction encoders and decoders are expected to be consistent, i.e., 
any encoder and its corresponding decoder are inverses of each other, and sound, 
i.e., they meet formal specifications of instruction formats that human could eas- 
ily understand and check. 


© The Author(s) 2021 
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Consistency is essential for verification of machine code because it guaran- 
tees that manipulation and reasoning over the abstract syntax of instructions 
can be mirrored precisely onto their binary forms. For example, verification of 
assemblers requires that instruction decoding reverts the assembling (encod- 
ing) process [20]. However, the previously proposed approaches to verifying 
instruction encoders and decoders all fail to establish consistency: to handle the 
complexity of instruction formats (especially that of CISC architectures), they 
employ expressive but ambiguous specifications such as context-free grammars 
or variants of regular expressions, from which it is impossible to derive consistent 
encoders and decoders. A representative example is the bidirectional grammar 
proposed by Tan and Morrisett [18]. It is an extension of regular expressions 
for writing instruction specifications from which verified encoders and decoders 
can be generated. However, because of the ambiguity of such specifications, two 
different abstract instructions may be encoded into the same bit string (i.e., a 
sequence of bits). When the decoder is deterministic, not all encoded instructions 
can be decoded back to the original instructions. 

In this paper, we present an approach to automatic construction of instruc- 
tion encoders and decoders that are verified to be consistent and sound. It is 
based on the observation that an instruction format inherently implies a bijec- 
tion between abstract instructions and their binary forms that manifests as the 
determinacy of instruction decoding in actual hardware. This is true even for the 
most complicated CISC architectures. From a well-designed instruction specifica- 
tion that precisely captures this bijection, we are able to extract an appropriate 
representation of instructions, a pair of instruction encoder and decoder between 
this representation and the binary forms of instructions, and the consistency and 
soundness proofs of the encoder and decoder. 

Based on the above ideas, we develop a framework for automatically generat- 
ing consistent and sound instruction encoders and decoders. It extends the app- 
roach to specifying and generating instruction encoders and decoders proposed 
by Ramsey and Fernandez [15] with mechanisms for validating their soundness 
and consistency by using theorem provers and SMT solvers. The framework con- 
sists of the following components (which are also our technical contributions): 


— A specification language for describing instruction formats. This language 
is deliberately weaker in expressiveness than regular expressions while strong 
enough for describing instruction formats on common ISAs. Different from the 
existing ISA specification languages, it is rich enough for precisely capturing 
the syntactical structures of instructions and their operands, which implicitly 
encode a bijection between the abstract and the binary representations of 
instructions. 

— The algorithms for automatically generating encoders and decoders from 
instruction specifications. Given any instruction specification, they generate 
an abstract syntax of instructions, a partial function from the abstract syntax 
to bit strings (i.e., an encoder) and a partial function from bit strings to the 
abstract syntax (i.e., a decoder). The generated definitions are formalized in 
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the Coq theorem prover so that the encoder and decoder can be formally 
validated later. 

— The algorithms for automatically validating the consistency and soundness of 
the generated encoders and decoders. Given any instruction specification, they 
synthesize the consistency and soundness proofs for the generated encoder 
and decoder in Coq. This is possible because the bijection implied by the 
original specification guarantees that the encoder and decoder are inverses 
of each other, under the requirement that the binary “shapes” of different 
instructions or operands do not overlap with each other. This requirement is 
inherently satisfied by any instruction format, and can be easily proved with 
SMT solvers. 


To demonstrate the effectiveness of our framework, we have applied it to a 
subset of 32-bit X86 instructions. In the rest of this paper, we first introduce 
relevant background information for this work and discuss the inadequacy of the 
existing work in Sect. 2. We then give an overview of our framework in Sect. 3 
by further elaborating on the points above. After that, we discuss the definition 
of our specification language and the ideas supporting its design in Sect. 4. In 
the two subsequent sections Sect. 5 and Sect. 6, we discuss the algorithms for 
automatically generating and validating encoders and decoders. In Sect. 7, we 
present the evaluation of our framework. Finally, we discuss related work and 
conclude in Sect. 8. 


2 Background 


For our approach to work, the specification language we use must support the 
instruction formats on contemporary RISC and CISC architectures. In this 
section, we first introduce the key characteristics of these formats and then 
present a running example. We conclude this section by exposing the inadequacy 
of the existing approaches in capturing the bijections between the abstract and 
binary forms of instructions. 


2.1 The Characteristics of Instruction Formats 


Fig. 1. The format of 32-bit X86 instructions 


Instruction formats on CISC architectures may vary in length and structure 
even for the same type of instructions and may contain complex dependencies 
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between their operands. In contrast, instructions on RISC architectures usually 
have fixed formats which are largely subsumed by CISC formats. Therefore, we 
focus on handling CSIC formats in this paper. 

We use the format of 32-bit X86 instructions as an example to illustrate the 
complex characteristics of CISC instructions. It is depicted in Fig. 1. An instruc- 
tion is divided into a sequence of tokens where each token is one or more bytes 
playing a particular role. The first token Opcode partially or fully determines 
the basic type of the instruction; it may be one to three bytes long. Follow- 
ing Opcode is an one-byte token ModRM. ModRM is further divided into 
a sequence of fields where a field f[ni : n2] represents a segment of the token 
named f that occupies the n2-th to n1-th bits in that token. Depending on the 
value of Opcode, ModRM may or may not exist. When it exists, the value 
of Reg_op[5:3] may contain the encoded representation of a register operand. 
Another operand of the instruction may be an addressing mode. It is collectively 
determined by the values of Mod[7:6], RM[2:0], the token SIB (scaled index 
byte) and the displacement Disp following ModRM. Finally, the instruction 
may have an operand of immediate values in the token Imms. 

For simplicity of our discussion, we have omitted some details such as the 
optional prefixes of instructions in Fig. 1. However, this simplified form is already 
enough to expose the key characteristics and complexity of CISC instruction 
formats (some of which also manifest in RISC). We summarize them below: 


1. Instructions as Composition of Components: At the abstract level, an instruc- 
tion consists of a collection of components. Each component serves a specific 
purpose and concretely corresponds to certain fields or tokens in the instruc- 
tion format. For example, the constituents of 32-bit X86 instructions can be 
classified into four different kinds of components (marked with different colors 
in Fig. 1): the component determining the types of instructions (Opcode), 
the component denoting register operands (Reg_op[5:3]), the component 
denoting addressing modes (Mod[7:6], RM[2:0], SIB and Disp) and the 
component denoting immediate values (Imms). 

2. Variance of Components: The concrete forms of components vary in different 
ways. A component may correspond to a single token (e.g., Opcode and 
Imms), a single field (e.g., Reg_op[5:3]), a mixing of fields and tokens (e.g., 
addressing modes), or other forms not shown here. Moreover, the abstract 
and concrete forms of a single type of components can also vary significantly 
such as the different addressing modes supported by X86 (as we shall see in 
detail in the following section). 

3. Interleaving of Components. In most cases, there are clear sequential orders 
between the concrete representations of components. For example, the com- 
ponent of addressing modes immediately follows that of opcode and precedes 
that of immediate values. In the other cases, components may be interleaved 
with each other. For example, the component of register operands is inter- 
leaved with the component of addressing modes. 

4. Dependencies between and in Components: The existence and forms of compo- 
nents are affected by the dependencies between each other and between their 
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own fields or tokens. For example, if an instruction does not take any argu- 
ment, then the value of its Opcode determines that there is no token follow- 
ing Opcode. For another example, when Mod[7:6] contains the value 0b11, 
the addressing mode is simply a register operand. Otherwise, the addressing 
mode may further depends on the values in SIB and Disp. 


Note that, despite the above complexity, an instruction format is designed to 
inherently embed a (partial) bijection between the binary forms of instructions 
and their abstract representation as the composition of components. This is to 
ensure the determinacy of instruction decoding in hardware. This bijection is 
the central property to be investigated in this work. 


2.2 A Running Example 


Table 1. The different forms of addressing modes 


AddrMode | Mod | RM Scale | Index Base Disp 
r Obiij|r = T = = 
(r) 0b00 | r 4 0b100 Ar Æ 0b101 | — = = = 
(d) 0b00 | 0b101 - - - d 
(s xi+ b) | 0b00 | 0b100 s i Æ 0b100 | b Æ 0b101 | — 


We present an example of encoding the add instruction to concretely illustrate 
the characteristics of the X86 instruction format. It will be used as a running 
example for the rest of the paper. The operands of add may have many forms. 
For simplicity, we only consider two cases: 1) the first operand is a register while 
the second one is an addressing mode, and 2) the first operand is an addressing 
mode while the second one is an immediate value. 

In the first case, Opcode is 0x03, indicating that ModRM exists and the 
first operand is encoded in its Reg_op field. The addressing mode has over 23 
combinations because of the dependencies and constraints over their fields. We 
list only some of the combinations in Table1, where - indicates that this field 
or token does not exist. The first row shows the direct addressing mode r where 
Mod is 0b11 and RM contains the encoded register operand r. The following 
three rows shows different kinds of indirect addressing modes. They are valid 
only if Mod is 0b00 and further constraints are satisfied. For example, the 
second row shows the indirect addressing mode (r) where r is encoded in RM. 
In this case, r must neither be ESP (encoded as 0b100) nor be EBP (encoded 
as 0b101). Similarly, the addressing mode (s * i + b) requires that RM must be 
0b100, Index must not be 0b100 and Base must not be 0b101. 

In the second case, Opcode is 0x81, indicating that ModRM exists, the 
first operand is an addressing mode, and the second operand is an immediate 
value following it. Here, Reg Op must be 0b000. 
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(a) add (4,%ecx,hesp), hebx 


(b) add 0x88, %ebx 


(c) add $0x66, (hebx) 


Fig. 2. Some concrete examples of instruction encoding 


We demonstrate the concrete examples of encoding add (4,%ecx, esp), hebx, 
add 0x88, %ebx and add $0x66, (%ebx) in Fig.2 where %ebx and %ecx are 
encoded into 0b011 and 0b001, respectively (the order of operands is reversed 
because we use the AT&T assembly syntax). Note how the forms of operands 
change significantly depending on the different values in the related fields. Note 
also, despite such complex dependencies, a bit string representing a valid add 
instruction corresponds to a unique combination of components. 


2.3 Inadequacy of the Existing Approaches 


The existing approaches to specifying instructions are either 1) too general and 
allow ambiguity or 2) too low-level and break the component-based abstrac- 
tion we just described. Either way, they fail to capture the inherent bijection 
embedded in an instruction format. 

The bidirectional grammars [18] demonstrate the first kind of inadequacy. 
They contain the alternation grammar Alt gi g2 for matching a bit string s 
when either the sub-grammar gı or g2 matches s. The ambiguity arises when 
both gı and gg match s: in this case, the same s corresponds to two different 
internal representations. Therefore, bidirectional grammars cannot encode bijec- 
tions in general. The same can be said for other work on verified parsing based 
on ambiguous grammars. We shall discuss them in detail in Sect. 8. 

The Specification Language for Encoding and Decoding (or SLED) demon- 
strates the second kind of inadequacy [15]. It is a language for describing trans- 
lations between symbolic and binary representations of machine instructions. 
On the surface, SLED takes the component-based view in specifying instruc- 
tions. However, SLED specifications are interpreted through a normalization 
process by which every component is flattened into a sequence of tokens. After 
that, the structural information of components is completely lost. As a result, 
users can only derive encoders from the normalized specifications. They need 
to write decoders by using completely different specifications called “matching 
statements.” This inability to generate matching encoders and decoders from a 
single specification is a common phenomenon in other approaches to ISA speci- 
fications. 
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In summary, no existing approach can precisely capture the bijections inher- 
ently embedded in instruction formats. This is the main intellectual problem we 
try to tackle in this paper. We shall elaborate on our solution to this problem 
in the remaining sections. 


3 An Overview of the Framework 


translate 


translate 


os instruction specifications (in CSLED) 

G: algorithms for generating formal definitions and proofs (in C++) 

A: abstract syntax of instructions (on paper and in Coq) 

S: relational specifications of instructions (on paper and in Coq) 
and D: encoders and decoders (in Coq) 


Fig. 3. The framework 


We develop a framework for automatic generation of verified encoders and 
decoders that are consistent and sound. It is depicted in Fig.3. To generate 
formally verified encoders and decoders, users first need to write down a speci- 
fication of instructions S in a language called CSLED (or CoreSLED). CSLED 
is an enhancement to SLED for characterizing the bijection between the binary 
forms and the abstract syntax of instructions. Roughly speaking, S consists of a 
collection of class definitions, each of which defines a unique type of components 
that form instructions or their operands; the “top-most” class defines the type of 
instructions. Each class is associated with a set of patterns to uniquely determine 
a bijection between the binary and abstract forms of components in that class. 
Note that this bijection exists only when certain well-formedness conditions for 
patterns are satisfied. We shall elaborate on these ideas in Sect. 4. 
From S, the following definitions are generated and translated into Coq: 


— The abstract syntax of instructions A. It is a collection of algebraic data types 
corresponding to the classes defined in S. 

— A relational specification of S called S. For each class, S contains a binary 
predicate that precisely captures the relation between components of that 
class and their binary forms. We write RIX] k l to denote that the component 
k of class K has the binary form l. 
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Then, S is fed into a collection of algorithms G to generate the following 


definitions and proofs in Coq: 


— An encoder E and a decoder D. The encoder is a set of partial functions—one 


for each class—from the abstract syntax of that class to bit strings. We write 
ik (k) = |L] to denote that l is the result of encoding a component k of class 
K where || denotes the some constructor of the option type. Conversely, the 
decoder is a set of partial functions from bit strings to the abstract syntax. 
We write Dx (I+-+l’) = |(k,l’)| to denote the decoding of the bit string l into 
a component k of class K where ++ is the append operation of bit strings. 
Here, the tailing bit string l’ represents the remaining bits after decoding the 
first component. 

The proof of consistency between the encoder and decoder. The consistency 
theorems are stated as the mutual inversion between the encoder and decoder: 


YK KLU,Ex(k) = [I] = Dx (I4-4U) = [(&, 0’) |. 
YK kll, De (I++) = |(k,l)] = Ex(k) = |l]. 


Their Coq proofs are automatically generated by inspecting the logical struc- 
ture of classes and patterns in S. For this, we need to derive a very important 
property: the decoder always decodes a bit string l back to the same sequence 
of components. We achieve this goal by combining proofs in Coq with SMT 
solving of verification conditions that are automatically derived from well- 
formed specifications. 

The proof of soundness of the encoder and decoder. The soundness theorems 
are stated as follows: 


VK KL, Ex(k) 
YK kll, De (i++!) 


I| => RIK] k1. 
(k,U)| => RIK] k l. 


=| 
=| 


As we shall see later, Ex and R[K] are both defined recursively on the defi- 
nition of classes in S. Their main difference is that the former is a function 
while the latter is a relation. Therefore, it is easy to prove the first soundness 
theorem by induction on k. By using the second consistency theorem and the 
first soundness theorem, we can easily prove the second soundness theorem. 


As we shall see in the following sections, the actual implementations of encoders 
and decoders and their consistency and soundness theorems are more compli- 
cated than presented here. Nevertheless, the above discussion covers the high- 


level ideas of our framework. 


Note that in Fig.3, S and G are not formalized and hence not in the trusted 


base. The consistency and soundness of 


z and D are independently validated 


by using Coq and SMT solvers. If the validation of either property fails, the 
framework reports a failed attempt to generate the encoder and decoder. This 
often indicates that the instruction specification is not well-formed. 
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4 The Specification Language 


The key idea underlying the design of CSLED is to record explicitly the struc- 
tures of components in instruction specifications, instead of normalizing them 
into tokens as did in SLED. In this way, CSLED specifications accurately cap- 
ture the key characteristics of instruction formats described in Sect. 2.1, hence 
the bijections embedded in instruction formats. In this section, we present the 
syntax of CSLED, explain the ideas underlying its design, and use the run- 
ning example to illustrate how CSLED specifications are written. We also intro- 
duce the syntactical and relational interpretations of CSLED specifications and 
present the well-formedness conditions for the bijections to exist. 


4.1 The Syntax 


S ::= (empty) Pua J 
|SD Pad 
TIZA 
D ::= token tid = T; T&A 
| field fid = F; A:=0 
| class kid = K; cls %i 
O ::= e: tid 
T a= (n) fid=n 
F n= tid (nı : n2) fidAn 
K ::= B fld %i 
IKI B O&O 
B ::= constr cid [aid] (P) 0O;0 
(a) Definitions (b) Patterns 


Fig. 4. The syntax of CSLED 


The syntax of CSLED is shown in Fig.4. A CSLED specification (denoted by 
S) consists of a list of definitions (denoted by D). The three kinds of definitions 
are for tokens (denoted by T), fields (denoted by F) and classes (denoted by K). 
Every definition is bound to a unique identifier where tid, fid and kid represents 
the identifiers of tokens, fields and classes, respectively. 

Tokens represent consecutive segments of bytes and are the basic elements for 
forming instructions. They are necessary for distinguishing the same sequence of 
bytes with different interpretations. Their definitions have the form (n) where n 
must be divisible by 8 which denotes a token of n-bits or n/8 bytes. Definitions 
of fields have the form tid (nı : ng) which denotes a field occupying the n-th to 
n-th bits in the token tid. 
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Classes represent specific types of components. They play a central role in 
the specifications by accurately capturing the component-based abstraction we 
discussed in Sect. 2.1. A class consists of a collection of branches (denoted by B) 
each of which denotes a possible form of components in the class. Definitions of 
branches have the form constr cid [aid] (P) where cid is a unique identifier for 
the branch (denoting a constructor) and [aid] is a list of fid or kid denoting the 
sub-components or fields for constructing a component (i.e., the arguments to 
the constructor). These arguments capture the nested structures of components 
where a bigger component may be constructed from smaller ones or basic fields. 

A branch is associated with a single pattern P. A pattern plays two roles: it 
determines the types of a sequence of tokens that concretely forms components 
of this branch, and it describes a relation between these tokens (and their fields) 
with the abstract arguments of the branch. This relation essentially encodes the 
bijection between the abstract and binary forms of components in this branch. 

At the top-most level, P is a sequence of judgments (denoted by 7) separated 
by ;, such that J1; ... ; Jn matches a sequence of tokens concretely represented 
by a bit string l if and only if l = l1++l2++ . ..++ln and Ji matches l; for 1 < 
i <n. This sequential pattern is enough for relating abstract and binary forms 
of components when each J; (and l;) corresponds to a single (sub-)component. 
However, according to the discussion in Sect. 2.1, components may be interleaved 
with each other and J; may correspond to multiple components. Therefore, a 
judgment is a conjunction of atomic patterns (denoted by A) each of which 
matches an interleaved component. In case there is no interleaving, a judgment 
reduces to a single atomic pattern. 

An atomic pattern has two forms: cls %i for relating a sequence of tokens 
to the i-th argument in [aid] of the corresponding branch which must be a class, 
and O for relating tokens to field arguments in [aid] and for further constraining 
the fields of these tokens. The © patterns are called basic patterns. Among them 
e:tid matches any token of type tid; fid = n (fid # n) matches a token with 
the field fid whose value is (is not) the constant n; similar to cls %i, fld %i 
relates the i-th argument in [aid] of the branch which must be a field to the 
concrete value of the field in the matching token. The last two cases of basic 
patterns indicate that arbitrary sequencing and interleaving of basic patterns 
are allowed. Despite such free interleaving, a basic pattern can only match with 
sequences of tokens of the same length and of a unique type because we require 
that O1 & O2 be well-formed only if both O; and Oz match sequences of tokens 
with the same type. Therefore, basic patterns have the same expressiveness as 
SLED specifications in their normalized forms [15]. 

In contrast to basic patterns, judgments and atomic patterns are much more 
expressive as they may match tokens of different lengths and forms. This is 
because a class pattern cls %i can match components of a class K with mul- 
tiple branches, each of which may have different patterns. By introducing class 
patterns into atomic patterns, we are able to represent the complete structures 
of components and establish bijections from these structures. This is the key 
improvement we made in CSLED compared to SLED. 
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4.2 The CSLED Specification of the Running Example 


token Opcode = (8); token Disp = (32); token Imms = (32); 
token ModRM = (8); token SIB = (8); 


field opcode = Opcode(7 : 0); field disp = Disp(31 : 0); 


field imms = Imms(31 : 0); field mod = ModRM (7: 6); 
field reg_op = ModRM(5:3); field rm = ModRM(2: 0); 
field scale = SIB(7 : 6); field index = SIB(5 : 3); 


field base = SIB(2 : 0); 


class Addrmode = 

constr addr_r [rm] (mod = 0b11 & fld %1) 

constr addr_ir [rm] (mod = 0b00 & rm 4 0b100 & rm # 0b101 & fld %1) 
constr addr_disp [disp] (mod = 0b00 & rm = 0b101; fld %1) 

constr addr_sib [scale, index, base] 

(mod = 0b00 & rm = 0b100; 

fld %1 & fld %2 & fld %3 & index + 0b100 & base # 0b101) 


class Instruction = 
| constr AddGvEv [reg_op, Addrmode] (opcode = 0x03; fld %1 & cls %2) 
| constr AddEvIz [Addrmode, imms] 

(opcode = 0x81; reg_op = 0b000 & cls %1; fld %2) 


Fig. 5. The CSLED specification of the running example 


The CSLED specification of our running example is depicted in Fig.5. The 
Addrmode class specifies the possible addressing modes. Its branches are trans- 
lated from the addressing modes described in Table 1 one by one, such that their 
patterns exactly match the binary structures of components in the correspond- 
ing branches. For instance, the branch addr_sib is translated from the fourth 
addressing mode in Table 1. Its pattern is a sequence of two judgment. The first 
judgment is a conjunction of two basic patterns that are the required constraints 
on the fields mod and rm of ModRM described in Table 1. Therefore, it must 
match the single token ModRM. The second judgment is a conjunction of basic 
patterns that constrain the fields index and base of SIB and relate arguments 
of addr_sib with the concrete values in the fields scale, index and base. Because 
these patterns all constrain the fields of SIB, the second judgment must match 
the single token SIB. 

Similarly, the Instruction class specifies the instructions. Its two branches 
characterize the two kinds of add instructions described in Sect.2.2. Note 
how conjunctions between the basic patterns for reg_op and class patterns for 
Addrmode are used to describe the interleaving of register operands and address- 
ing modes. Note also that in every branch of Addrmode the first pattern matches 
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the token ModRM, and in any branch of Instruction the token Opcode is always 
followed by Addrmode. Therefore, ModRM always follows Opcode as desired. 

By this example, we demonstrate the critical feature of CSLED: because the 
syntax of CSLED is designed to precisely describe instruction formats in ISA 
manuals, it implicitly captures the embedded bijections. Note that, because of 
its faithfulness to the ISA manuals, CSLED’s syntax contains full details about 
instruction encoding by nature. However, it is not hard to imagine this syntax 
being refined to the client’s syntax through another straightforward bijection. In 
fact, this is how we anticipate clients will use CSLED in practice, e.g., to build 
verified assemblers for X86. 


4.3 Interpretation of CSLED Specifications 


From a CSLED specification S, we extract 1) a collection of data types for 
representing the abstract syntax of components, and 2) a collection of binary 
relations between these data types and bit strings for representing the mappings 
between the abstract and concrete forms of components. 


Data Types of Components. We use the operator T|—] to denote the inter- 
pretation of basic fields and classes into data types. The translation for fields are 
simple: given a field definition field fid = tid (nı : n2), TL fid] = (nı — n2 + 1) 
where (n) represent an unsigned binary integer of n bits. Note that we do not 
further translate the values of fields as they have straightforward interpreta- 
tions (such as the mapping from bits to registers described in Sect. 2.1). The 
interpretation of classes is only slightly more involved. Given a class definition 
class kid = K, T[kid] is an algebraic data type named kid. For each branch 
constr cid [aid,,..., aid,] P of K, there is a constructor cid for kid that takes 
n arguments of types T[aid,],..., T]aidn]. 


Relations Derived from CSLED. The translation of CSLED specifications 
into relations is defined in Fig.6. Here, BS denotes the type of bit strings. 
When aids = [aid,,...,atd,] we write T|aids] to denote the product type of 
Tl aid,],...,T[atd,]. We use = to denote the definitional equality. 

The function R[aid] translates a type of components associated with aid into 
a binary relation between its abstract representation and bit strings, where aid 
may denote a field or a class. The definition for field components is straightfor- 
ward. R[ kid] k l holds iff there is a branch of kid whose interpretation relates k 
and l, which further requires (by the third rule in Fig.6) that k is constructed 
by using the constructor of that branch and the pattern of the branch relates 
the arguments of the constructor to l. The latter relation is defined by R,[—, —] 
such that R,[P, aids] args | holds iff P matches | and the arguments args satisfy 
the constraints enforced by P and aids. More specifically, R,[P; 7, aids] args l 
holds iff P matches a prefix of l and J matches the rest of l. The definition of 
Rp [J&A] is slightly different in that Rp | J&A, aids] args l holds iff A matches 
the whole l and J matches a prefix of l. This is necessary for describing the 
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R[fid] ::=A(f : TI fia]) (1: BS). 
A(tid nı nə na), tid = (na) A fid = tid (nı : n2) 
A length(l) = n3 A lini : n2] = f 
R[kid] ::=A(k : T[kid]) (l: BS). 
JB, kid = ... IBI... A Rè[B, kid] k l 
R [B, kid] ::=A(k : T[kid]) (l : BS). 
Jargs, k = cid args \ Rp[P, aids] args l 


(where B = constr cid aids P) 

Rp[P; J, aids] ::=A(args : T|aids]) (l: BS).-3lı l2, l = lı +h 
A^ Rp[P, aids] args h A Rp[J, aids] args l2 
Rp | I&A, aids] ::=A(args : T|aids]) (l: BS)-3lh lo, l = lı++l2 
A Rp[J, aids] args hı A Rp[A, aids] args l 

Rple: tid] ::=A(args : T[aids]) (l: BS).3n, tid = (n) Alength(l) =n 
R,[fid = n, aids] ::=A(args : T]aids]) (1: BS) .3(tid fid nı nə ns), tid = (n3) 
A fid = tid (nı : n2) Alength(l) = ng A lni : na} =n 
Rp[fid 4 n, aids] ::=X(args : T]aids]) (l: BS) .A(tid fid nı nə nz), tid = (nz) 
A fid = tid (nı : n2) Alength(l) = ng A lni : ne] An 
Rplfld %i, aids] ::=A(args : T[aids]) (l : BS).R[aids[|i]] args[i] l 
R,[[cls %i, aids] ::=A(args : Tlaids]) (l : BS).R[aids[i]] args[i] l 


Fig. 6. Translation of CSLED specifications into relations 


interleaving of components. Furthermore, certain constraints need to be satis- 
fied for deriving a bijection as shall discuss in Sect. 4.4. R [01 ; O2, aids] and 
R [01&02, aids] are not shown in Fig. 6 because they are defined the same as 
R,[P; 7, aids] and R,|.7&A, aids], respectively. Rp[fid = n, aids] args l holds 
iff J is a token containing fid whose value is n; similar for R,[fid # n, aids]. 
R,[£1d %i, aids] holds iff the i-th argument in args matches with the concrete 
value found in J; same for R,[cls %i, aids]. Note how the last two definitions 
make use of args for getting the values of arguments. 


4.4 Well-Formedness of Specifications 


The binary relation we define in the last section denotes a bijection only when the 
CSLED specification under investigation satisfies certain well-formedness condi- 
tions. These conditions guarantee that, given any bit string l, there is at most one 
abstract object related to l via the defined binary relation. Well-formedness is 
the composition of three properties which we call disjointness, compatibility, and 
uniqueness. We give and explain their definitions below. The logic for checking 
these conditions is embedded in the generation algorithms we will discuss in the 
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next section and will be exploited for the validation of the generated encoders 
and decoders. 


Disjointness. Given a pattern P&P, it satisfies disjointness if Pı and P2 
match disjoint fields.! To understand this, suppose Pı and P2 relate different 
abstract arguments a, and az to overlapping bits in a bit string l. Then, we 
cannot determine if the values in the overlapping bits are for a, or az. Hence, 
the derived binary relation cannot possibly be a bijection. Disjointness rules out 
such possibility. 


Compatibility. We call the types of sequences of tokens a pattern P matches 
the “shapes” of P. Given a pattern P1&P2, it satisfies compatibility if every 
possible shape of P; is in a prefix of every possible shape of Pa when P» is a class 
pattern (and vice versa). Enforcing compatibility simplifies the interpretation of 
P&P when Pı or Pa is a class pattern with multiple branches that may match 
bit strings with different shapes. Compatibility makes sense because for common 
instruction formats it is always the case that the components matched by Pı are 
embedded in the longest common prefixes of all the possible shapes of Pa when 
P2 is a class pattern (and vice versa). For example, in the example depicted 
in Fig. 2, Reg_op is always embedded into the common prefix of all the possible 
shapes of addressing modes, i.e., the ModRM token. 


Uniqueness. Given a class pattern K, it satisfies uniqueness if for any bit string 
l, at most one of its branches matches l. Uniqueness is essential for ensuring the 
determinacy of decoders in presences of class patterns. Fortunately, it implicitly 
holds for common instruction formats as they are designed with determinacy 
of decoding in mind. To concretely check the uniqueness implied by instruction 
formats, we first define the structural condition for a branch with pattern P as 
the conjunction of the statically known constraints in P, denoted by [P] cona- 
We then require that no structure conditions for any two branches of a class 
can be satisfied simultaneously. This requirement allows us to uniquely deter- 
mine the branch used to construct a class component. For example, the struc- 
tural conditions of the first three branches of Addrmode are (mod = 0b11), 
(mod = 0b00 & rm # 0b100 & rm + 0b101) and (mod = 0b00 & rm = 0b101). 
Obviously, any pairwise combination of these conditions cannot possibly be sat- 
isfied. This is true even if we consider all the branches of Addrmode. Therefore, 
there is at most one way to decode any addressing mode. 


5 Generation of Encoders and Decoders 


We discuss the algorithm for generating encoders and decoders from CSLED 
specifications. The structures of these encoders and decoders closely match the 
relations derived from specifications. Furthermore, every operation in an encoder 
has a counterpart in the corresponding decoder, and vice versa. 


1 We abuse the notation by using P to denote suitable patterns such as J, A or O. 


742 X. Xu et al. 


5.1 Generation of Encoders 


Gele: tid, bs, args] ::= | bs | 
[fid = n, bs, args] :: 
gLfid A n, bs, args] :: 
g]f1d %i, bs, args] ::= writefa bs args{i| (where fid is the field id of args|]) 


writefa bs n 


G 
G 


assert(readja bs £ n) 
G 
Gelcls %i, bs, args] ::= Ex(args{i], bs) (where K is the class of argsfi]) 
Gz[O1 ; O2, bs, args] ::= l < first_n(bs, [Oilltokens); l2 + skip_n(bs, [O1]okens) 
bs1 + Gr[Oi, h, args]; bs2 + GelO2, l2, args]; | bs1 +482 | 
GzO1 & O2, bs, args] ::= bsı < GeO, bs, args]; Ge[O2, bsi, args] 

GaP; 7, bs, args] ::= bsı < Gel|P, bs, args]; bs’ + skip_n(bs, |bs1|); 
bso < Grell 7, bs’, args]; | bs1 ++bs2 | 
Ge. 7&A, bs, args] ::= bsı + Gel TJ, bs, args]; Ge |A, 651, args] 


Fig. 7. Generation of encoders from patterns 


From every class K, we extract an encoder Ex for its components. It is a partial 
function that takes two arguments—a component k and a bit string / representing 
the result previously generated by encoders—and outputs an updated bit string 
if the encoding succeeds. We shall write Ex(k,!) = |V] to denote that l’ is the 
result of encoding k on top of l. 

ax (k,l) is defined by recursion on the structure of k. For every branch B 
of K, we generate a piece of Coq code from the pattern P of B for encoding 
k. We then insert it into the definition of Ex(k,l). We write Gg[P, bs, args] to 
denote the code snippet so generated, where bs is the name of the generated 
bit string at this point and args contains the names of the arguments to the 
constructor. Ggl[P, bs, args] is defined in Fig.7 where we use the option monad 
for sequencing the encoding operations. The first case is obvious. Code generated 
by Gglfid = n, bs, args] writes the constant n into the field associated with fid. 
Gglfid 4 n, bs, args] checks whether the corresponding field contains the constant 
n and returns none if the checking fails. Gg[£1d %i, bs, args] writes the value of 
the i-th argument into the corresponding field. Gg[cls %i, bs, args] calls the 
encoder for the class corresponding to cls %i. GgO1 ; O2, bs, args] encodes its 
two parts recursively and concatenates the results together, where first-n(bs, n) 
returns the first n bits in bs and skip_n(bs,n) skips the first n bits in bs and 
returns the remaining ones. Gg[O1&Q2, bs, args] first encodes data matching O1, 
and then passes the result to the encoding for Oj. The last two cases are similar. 
Note that if the generated code occurs at the beginning of a branch, then bs 
coincides with the input argument l. Otherwise, bs denotes intermediate results. 
As we can see, all these cases follow the logical structure of CLSED specifications 
we have described before. 
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5.2 Generation of Decoders 


From every class K, we extract a decoder Dx. It is a partial function such that 
De(l) = [(k,l1,l2)| holds iff l = l'++l2, l’ is the binary representation of k, 
and lı is the result of inverting the encoding operation, i.e., setting every bit the 
decoder touches in l’ to 0. This extra return value is introduced to help with the 
verification as we shall see in Sect. 6. 


Gole: ted, bs, args] ::= remains + skip_n(bs, tid); |(bs, remains) | 


Gp|fid = n, bs, args] ::= ori + clear fa bs; remains + skip_n(bs, tid); 


|(ori, remains) | (where fid = tid (nı : n2)) 


Go| fid 4 n, bs, args] ::= ort + clear fa bs; remains + skip_n(bs, tid); 
|(ori, remains) | (where fid = tid (nı : n2)) 
Gp[£1d %t, bs, args] ::= argi + readfa bs; ort + clear fa bs; 


remains + skip_n(bs, tid); |(ori, remains) | 
(where fid is the field id of argsfi]) 
Gp|cls %i, bs, args] ::= argi, origin, remains + Dx (bs); |(origin, remains) | 
(where K is the class of args[i]) 
Gp|O1 ; O2, bs, args] ::= ori1, remains, + Gp[O1, bs, args]; 
oriz, remainsz + Gp[O2, remains1, args]; 
|(ori1 ++ori2, remains2) | 
Gp[O1 & O2, bs, args] ::= remains + skip_n(bs, |[O2] tokens); 
ori, - + Gp||O2, bs, args]; 
orilst, _ — Gp|O1, ori, args]; 
|(orilst, remains) | 
Gol P; J, bs, args] ::= ori, remains, + Go[P, bs, args]; 
oriz, remainsz < Go| J, remains,, args]; 
| (ori1++0ri2, remains) | 
Go|. I&A, bs, args] ::= ori, remains + Gp] A, bs, args]; 
orilst, -+ Go| J, ori, args]; 


| (orilst, remains) | 


Fig. 8. Generation of decoders from patterns 


The first step of Dx is to decide which branch of K should be chosen for 
decoding l. It can be done by checking the structural conditions derived from 
the patterns of branches (which we have introduced in Sect. 4.4) against l. Specif- 
ically, for the pattern P of each branch of K, we translate its structural condition 
[P] cond into a decision procedure in Coq (a function returning boolean values) in 
a straightforward manner. We then insert an if-statement to check if [P] cona can 
be satisfied. If so, we start the decoding process for this branch. Otherwise, we 
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repeatedly check other branches until a matching case is found. Note also that 
by uniqueness, there is at most one structural condition that can be satisfied. 
Therefore, Dx is deterministic in choosing branches. 

Once a matching branch is found, we use the algorithm Gp[P, bs, args] (the 
counterpart of Gg[P, bs, args]) to generate a piece of Coq code for decoding 
the arguments of this branch. It is defined in Fig. 8. Similar to encoding, the 
generated code snippet follows the logical structure of CSLED specifications. 
The function clearga bs set the bits of the field fid in bs to 0. Note that the 
decoding operations are exactly the inversion of those in Fig. 7. Note also that 
the fourth and fifth cases in Fig.8 are responsible for decoding the arguments 
and storing them in argi. By applying the corresponding constructor to these 
arguments, we get the output component k, which together with the two values 
returned by Gp form the final output of Dx . 


5.3 Generation for the Running Example 


We show the representative cases of the generated encoder and decoder for our 
running example in Fig. 9. They include the encoding and decoding procedures 
for the fourth branch of Addrmode (the most complicated one). We can see that 
the encoding and decoding operations are exactly the inverses of each other. The 
encoder first writes the fields in ModRM and then those in SIB. Conversely, 
the decoder first reads the fields in ModRM and then those in SIB. Finally, it 
forms the component and returns the reverted and remaining bits. The function 
BF_addr_sib is the decision procedure generated from the structural condition 
for the fourth branch of Addrmode. We also show the encoding and decoding 
procedures for the first add instruction in Fig. 9. Their structures are very similar 
to those of Addrmode. 


6 Validation of Encoders and Decoders 


In this section, we discuss how to exploit the logical structure of and the well- 
formedness conditions for CSLED specifications to automatically synthesize the 
proofs of consistency and soundness for encoders and decoders. 


6.1 Synthesizing the Proof of Consistency 


The consistency between encoders and decoders is composed of two properties 
and stated as follows: 


Theorem 1 (Consistency between Encoders and Decoders). Given any 
class K, its encoder Ex and decoder Dx are consistent with each other if they 
invert each other. That is, the following properties hold: 


4) = [r] = Dg(r++) = [(K, 1,0). 
(k,1,0)| => Ex(k,l) = |r]. 


Vk Url’, valid_input, (1) => Ex(k 
VklrU,De(r++l’) = | 
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Definition encode_addrmode instance input : 


match instance with 


| addr_sib argi arg2 arg3 = 


Definition decode_addrmode bs := 


if BF_addr_sib bs then 
(* Revert the encoding of ModRM *) 


(* Encode ModRM *) let ori := clear_mod bs in 
let ModRM := input in let ori := clear_rm ori in 
let tmp := write_mod ModRM b["00"] in let oril := ori in 


let tmp := write_rm tmp b["100"] in 
let resultO := tmp in 
(* Encode SIB *) 
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do remains +— skipn bs 8; (* Skip ModRM *) 


(* Decode SIB to get the arguments 
and revert the encoding of SIB *) 


let SIB := zeros 8 in let bs := remains in 

let tmp := write_scale SIB argi in let arg3 := read_base bs in 
let tmp := write_index tmp arg2 in let ori := clear_base bs in 
let tmp := write_base tmp arg3 in let arg2 := read_index ori in 
let index := read_index tmp in let ori := clear_index ori in 
let base := read_base tmp in let argi := read_scale ori in 
do _ + assert(index Æ b["100"]); let ori := clear_scale ori in 
do _ + assert(base Æ b["101"]); let ori2 := ori in 


let resulti := tmp in 

(* Concatenate the results of 
encoding ModRM and SIB *) 

Some (result0++result1) 


do remains +— skipn bs 8; 

(* Return the result *) 

Some(addr_sib argi arg2 arg3, 
oriit++ori2, remains) 


(* Skip SIB *) 


end 


else if BF_addr_r bs then ... 


Definition encode_instr instance input := 
match instance with 
| AddGvEv arg1 arg2 > 


Definition BF_addr_sib bs := 
let ModRM := firstn bs 8 in 
(* mod = 0b00 ^ rm = 0b100 *) 
let result := 
(ModRM & b["11000111"]) = b["00000100"] in 
let tmp := skipn bs 8 in 
let SIB := firstn tmp 8 in 
(* index Æ 0b100 *) 
let result10 := 
(SIB & b["00111000"]) # b["00100000"] in 
(* base # 0b101 *) 
let result11 := 
(SIB & b["00000111"]) # b["00000101"] in 
resultO ^ result10 ^ result1i. 


let tmp := write_reg_op ModRM argi in 

do tmp +— encode_addrmode arg2 tmp; 
a 
end. 


Definition decode_instr bs := 
if BF_AddGvEv bs then 


do arg2, ori, remains — 
decode_addrmode bs; 

let argi := read_reg_op ori in 

let ori := clear_reg_op ori in 


Definition BF_AddGvEv bs := 
let Opcode := firstn bs 8 in 
(Opcode & b["11111111"]) = b["00000011"]. 


Fig. 9. Encoders and decoders generated from the running example 


We first discuss how the proof for the first property in Theorem 1 is generated. 
Here, the assumption valid-inputę(l) asserts that all the bits in / that may be 
modified by Ex must be 0. This is necessary to ensure that the decoder can 
revert the resulting bit string back to its initial state by setting them to 0 (i.e., 
the second result of decoding is the same as /). 

The proof proceeds by induction on the structure of k. For each branch 
B with the pattern P, we generate a lemma and its proof that the decision 
procedure generated from [P]cona as described in Sect. 5.2 always returns true 
given any bit string generated by the encoder for P. With this lemma, the proof 
for the “symmetric” case where the decoder takes the same branch as the encoder 
reduces to proving that the encoder and decoder generated from P are inverses 
of each other. This proof is straightforward by the definitions of Gg and Gp 
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in Sect. 5. An important point to note is that, for any pattern cls %i, we need 
to recursively apply the consistency lemma for its corresponding class, which 
in turn requires us to establish a valid_input assumption. By the disjointness 
property in Sect. 4.4, we can easily conclude that the encoding of sub-components 
does not interfere with each other, thereby the desired valid_input assumption 
can be derived. 

To finish the proof, we need to show that the “asymmetric” cases are not 
possible. For each asymmetric branch 6’ with the pattern P’, we have that 
[P’]cona holds by the decision procedure guarding this branch. Furthermore, 
by the above reasoning, [P]cona holds. We hence have that the conjunction 
of [Pl]eona and [P’]cona holds. However, this contradicts with the uniqueness 
property given in Sect. 4.4. Therefore, the decoder can never go into a branch 
different from the encoder. Continue with our running example, suppose we 
are proving the consistency of the encoder and decoder for Addrmode. Further 
suppose we are working on the branch with the constructor addr_sib. Then, the 
verification condition for the asymmetric case with the constructor addr_r is 


Vbs, (readmoa bs = 0b00 A readpm, bs = 06100...) A (readmoa bs = 0b11) 


which cannot possibly hold (for simplicity we omit the conditions for index and 
base). We note that such condition can be easily checked by any SMT solver 
with the theory of bit-vectors, and we use Z3 [5] to validate them. This checking 
can also be directly formalized in Coq, which we plan to do in the future. 

Finally, the second property in Theorem 1 can be proved by induction on k 
in a similar fashion. We elide a discussion of its proof. 


6.2 Synthesizing the Proof of Soundness 


As we have discussed in Sect.4.3, the relational specifications extracted from 
CSLED specifications are tightly related to the actual instruction formats. Thus, 
it is reasonable to check the soundness of the generated encoders and decoders 
against these specifications. The relational specifications are easily translated 
into Coq definitions and we shall use the same notations. The soundness of 
encoders and decoders is then stated as follows: 


Theorem 2 (Soundness of Encoders and Decoders). Given any class K, 
its encoder Ex is sound if the following property holds: 


Vkirl',Ex(k,l) = |r] = RIK] kr. 
Similarly, its encoder Dx is sound if the following holds: 
Vkirl,De(r+tl’) = [(k,1,U)| = > RIK] kr. 


The soundness of encoder is easily proved by induction on the structure of k. We 
need to exploit the well-formedness conditions of CSLED specifications as for 
the consistency proofs at relevant points. The soundness of decoder is a corollary 
of the soundness of encoder and the second consistency property. 
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7 Evaluation 


Besides the CSLED language, our framework has two major parts: 1) the algo- 
rithms for generating encoders, decoders and their proofs and 2) a Coq library 
containing the definitions and properties of basic types (including bits, bytes 
and bit strings) and a collection of automation tactics (Ltac definitions) for 
proof synthesis. The generation algorithms amount to 5,193 lines of C++ code 
(excluding comments and empty lines, and likewise for the following statistics). 
The Coq library amounts to 1,036 lines of Coq code (written in Coq 8.11.0 and 
counted using coqwc). We also make use of the monad definitions and some basic 
data formats in CompCert’s library [13]. The whole framework took six person 
months to develop. 


Table 2. The lines of generated Coq code 


Component Lines of definitions | Lines of proofs 
Relational specification 1762 0 
AST, encoder and decoder 5677 0 
Verification conditions 37011 4402 
Consistency proof 295 30841 
Soundness proof 60 7193 
Total 44805 42436 


To evaluate the effectiveness of our framework, we have written a CSLED 
specification for a total of 186 representative X86-32 instructions which cover the 
operands with the most complicated formats (e.g., addressing modes) and are 
sufficient for supporting the assembling process in CompCert’s X86-32 backend. 
The specification is very succinct, containing only 260 lines of CSLED code. 
From this specification, our framework automatically generates around 87k lines 
of Coq code which form the verified encoder and decoder. The lines of Coq 
definitions and proofs for individual components are shown in Table 2. Note that 
the verification conditions account for a major part of the definitions because 
we need to consider all the possible combinations of structural conditions for 
the proofs of consistency and soundness. The Coq proofs related to verification 
conditions are for identifying the concrete forms of structural conditions. As 
expected, the consistency proof is the most complicated one among all the proofs. 

To evaluate the performance of the generated encoder and decoder, we ran- 
domly generate four sets of instructions, encode them into bit strings, and decode 
the bit strings back. The executable encoder and decoder are obtained by extract- 
ing Coq definitions into OCaml programs and compiling with OCaml 4.08.0. 
We repeat this experiment for 30 times on a machine with Intel(R) i7-4980HQ 
CPU@2.8 GHz and 16 GB memory. For comparison, we conduct the same experi- 
ments on the hand-written encoder and decoder in the X86-32 back-end of Com- 
pCertELF [20]. The results are shown in Table 3. For each test case, it shows the 
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Table 3. Performance evaluation 


No. of Instr. | CSLED Hand-Written 

Enc. Time (s) | Dec. Time (s) | Enc. Time (s) | Dec. Time (s) 
Med Var.(%) Med Var.(%) | Med Var.(%) | Med Var.(%) 
6000 0.32 0.00 0.56 0.00 0.01 0.00 0.01 0.00 


12000 0.64 0.00 1.12 0.00 0.01 0.00 0.02 0.00 
18000 0.98 0.03 1.70 0.15 0.02 0.00 0.03 0.01 
60000 3.11 0.16 5.43 0.01 0.08 0.00 0.09 0.01 


numbers of randomly generated instructions and the median time (in seconds) 
and the variance (in percentage) for encoding and decoding. We observe that 
the automatically generated encoder and decoder perform reasonably well, but 
significantly slower than the hand-written ones. This is because 1) the hand- 
written encoder and decoder in CompCertELF currently supports significantly 
less instructions (about 20) than the CLSED ones due to the complexity in man- 
ual implementation, and 2) the hand-written ones are manually optimized while 
the auto-generated ones are not optimized at all. We plan to solve the above 
issues by optimizing our generation algorithms in the future. 


8 Related Work and Conclusion 


We compare our framework with existing work on specification languages of 
instruction sets, verified parsing and pretty printing, and formalized ISAs. 

There exists a lot of work on developing languages for specifying ISAs. Their 
major deficiency is the lack of formal guarantees. For example, the nML specifi- 
cation language employs attribute grammars to describe instruction sets [7]. For 
another example, EEL uses machine independent primitives to provide syntac- 
tic and semantic information of instructions [12]. The most relevant work in this 
category is the SLED language which our CSLED is based upon [15]. The pat- 
terns in SLED can only describe constraints on tokens and fields. By contrast, 
CSLED contains class patterns for accurately characterizing the structures of 
components. This extension enables CSLED to capture the bijection between 
the abstract and concrete forms of instructions. 

Instruction decoding and encoding are special cases of parsing and pretty 
printing, respectively. Although there was early work on verifying that pars- 
ing and pretty-printing are inverses of each other by formulating them as bijec- 
tions [1,10], this requirement was perceived as too strong [16]. Most of the recent 
work on verified parsing and pretty printing are dedicated to verify parser gener- 
ators based on context-free grammars, regular expressions, parser combinators, 
or general data formats [3,11,17]. Some of them are also specialized work on 
verifying the encoder-decoder pairs [6,14,19,21]. They mostly deal with gen- 
eral and ambiguous grammars or specifications where bijection is difficult (if not 
impossible) to establish. By contrast, we intentionally restrict the expressiveness 
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of CSLED specifications to make proving consistency possible. Specifically, the 
syntax presented in Fig. 4 implies that CSLED specifications can only match 
sequences of tokens with finite lengths and shapes, making it strictly weaker 
than regular expressions, yet sufficiently strong for precisely capture the com- 
mon instruction formats. 

There is also abundant work on the development of formal ISA specifications 
(e.g., [2,4,8,9]). However, almost all of them focus on the problem of rigorously 
defining the semantics of ISAs (such as their sequential behaviors, concurrency 
models and interrupt behaviors). Although formalized encoders or decoders (or 
both) are sometimes generated (e.g., in Coq or Isabelle/HOL), there is no formal 
verification of the soundness or consistency of instruction encoding and decoding 
which only concerns the syntax of instructions. 

In this paper, we have presented a framework for specifying instruction for- 
mats and for automatically generating and verifying encoders and decoders based 
on such specifications. The verified encoders and decoders are consistent with 
each other (being inverses of each other) and sound (conforming to high-level 
specifications). Consistency is provable in our framework because our specifica- 
tions capture the bijections inherently embedded in instruction formats. In the 
future, we would like to apply this framework to a major part of X86-32 and X86- 
64 instructions and also to other ISAs, thereby to demonstrate the versatility 
and scalability of our framework. 
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Abstract. Several automatic verification tools have been recently devel- 
oped to verify subsets of LLVM’s optimizations. However, none of these 
tools has robust support to verify memory optimizations. 

In this paper, we present the first SMT encoding of LLVM’s mem- 
ory model that 1) is sufficiently precise to validate all of LLVM’s intra- 
procedural memory optimizations, and 2) enables bounded translation 
validation of programs with up to hundreds of thousands of lines of code. 
We implemented our new encoding in Alive2, a bounded translation val- 
idation tool, and used it to uncover 21 new bugs in LLVM memory opti- 
mizations, 10 of which have been already fixed. We also found several 
inconsistencies in LLVM IR’s official specification document (LangRef) 
and fixed LLVM’s code and the document so they are in agreement. 


1 Introduction 


Ensuring that LLVM is correct is crucial for the safety and reliability of the 
software ecosystem. There has been significant work towards this goal including, 
e.g., formally specifying the semantics of the LLVM IR, (intermediate represen- 
tation). This entails describing precisely what each instruction does and how 
it handles special cases such as integer overflows, division by zero, or deref- 
erencing out-of-bounds pointers [8,24,26,29,47]. There has also been work on 
automatic verification of classes of optimizations, such as peephole optimiza- 
tions [25,31], semi-automated proofs [48], translation validation [20,35, 42, 44], 
and fuzzing [23,46]. All this work uncovered several hundred bugs in LLVM. 

While there has been great success in improving correctness of scalar opti- 
mizations, current verification tools only support basic memory optimizations, if 
any. Since memory operations can take a significant fraction of a program’s run 
time, memory optimizations are very important for performance. The implemen- 
tation of these optimizations and related pointer analyses tends to be complex, 
which further justifies the investment in verifying them. 

Verifying programs with memory operations is very challenging and it is hard 
to scale automatic verification tools that handle these. The main issue lies with 
pointer aliasing: which objects does a given memory operation access? Without 
any prior information, a verifier must consider that each operation may load or 
© The Author(s) 2021 
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store from any live object (global variables and stack/heap allocations). This 
creates a big case split for the underlying constraint solver to (attempt to) solve. 

Since automatic verification of the source code of memory optimizations is 
out of reach at the moment, we focus on bounded translation validation [30, 
40| (BTV) instead. (Bounded) translation validation consists in verifying that 
an optimization was correct for a particular input program (up to a bounded 
unrolling of loops) rather than verifying its correctness for all input programs. 

In this paper, we present the first SMT encoding of LLVM’s memory 
model [24] that is precise enough to validate all of LLVM’s intraprocedural mem- 
ory optimizations. The design of the encoding was guided by practical insights of 
the common aliasing cases in BTV to achieve better performance. For example, 
we observed that in most cases we can cheaply infer whether a pointer aliases 
with a locally-allocated or a global object (but not both). Therefore, our encod- 
ing case-splits itself on this property rather than leaving that to the SMT solver, 
as we can cheaply resolve the case split for over 95% of the cases. 

The second contribution of this paper is a new semantics for heap allocation 
for the verification of optimizations for real-world C/C++ programs. Although 
LLVM’s memory model has a reasonable semantics for heap allocations [24], we 
realized it was not suitable for verifying optimizations. In some programming 
styles, the result of functions such as malloc is not checked against NULL and 
the resulting pointer is dereferenced right away. Since malloc can return NULL 
in some executions, we could end up proving that some undesirable optimiza- 
tions were correct since the program triggers undefined behavior in at least one 
execution. We propose a new semantics for heap allocations in this paper that 
is better suited for the verification of optimizations. 

The third contribution is the identification of approximations to the SMT 
encoding such that it is still sufficiently precise to verify (and find bugs) in 
LLVM’s memory optimizations. This is possible since for translation validation 
we only need to be as precise as LLVM’s static analyses (e.g., in the encoding 
of aliasing rules), and therefore we do not need to consider extremely precise 
analyses nor arbitrary transformations. Compilers have limited reasoning power 
by construction in order to keep compilation time reasonable. 

We implemented our new SMT encoding of LLVM’s memory model in 
Alive2 [30], a bounded translation validation tool for LLVM. We used Alive2 
to find and report 21 previously unknown bugs in LLVM memory optimizations, 
10 of which have already been fixed. 

To summarize, the contributions of this paper are as follows. 


1. The first SMT encoding of LLVM’s memory model that is precise enough to 
verify all of LLVM’s intraprocedural memory optimizations. 

2. A new semantics for heap allocations for the verification of optimizations of 
real-world C/C++ programs (Sect. 5.1). 

3. A set of approximations to the SMT encoding to further improve the perfor- 
mance of verification without introducing false positives or false negatives in 
practice (Sect. 9). 
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4. Thorough evaluation of LLVM’s memory model against LLVM’s implemen- 
tation, which uncovered deviations from the model (Sect. 10.3). 

5. Identification of 21 previously unknown bugs in LLVM. We present a few 
examples in Sect. 10.1. 


2 Overview 


Consider the functions below in C:' a source (original) function on the left and 
a target (optimized) function on the right. According to the semantics of high- 
level languages, and also of LLVM IR, a pointer received as argument or a callee 
cannot guess the address of a memory region allocated within a function. That 
is, pointer q is not aliased with p, r, nor touched by g(pt1). Although the caller 
of f may guess the address of q in practice, that behavior is excluded by the 
language semantics because p’s object (provenance) cannot be a fresh one like q. 
If p happens to alias q, accessing such pointer triggers undefined behavior (UB). 


1 int f(int *p) { 1’ int f(int *p) { 

2 int *q = malloc(4); 2’ // q removed 

3 *q = 42; 3 

4 int *r = g(pt1); 4! int *r = g(p+1); 
5 *r = 37; 5/ *r = 37; 

6 return *q; 6’ return 42; 

7 4+ 73} 


The provenance rules allow LLVM to forward the stored value in line 3 to line 
6, and therefore line 6’ simply returns 42. As the value stored to *q is not used 
anymore and pointer q does not escape, LLVM also removes the heap allocation. 
Next we show how to verify this example. Note that we do not require the two 
programs to be aligned; the example is aligned to make it easier to understand. 


2.1 Verifying the Example Transformation 


We start by defining two auxiliary functions that encode the effect of memory 
operations on the program state. Let state S = (m, ub) be a pair, where m is a 
memory and ub a boolean that tracks whether the program has already executed 
UB or not. Let p be the accessed pointer, and v the stored value. The definition 
of functions load and store is as follows: 


load p S ::= ( load(p, S.m) , (S.m, S.ub V ~ deref(p, sizeof (xp), S.m) )) 
store p v S ::= ( store(p,v,S.m) , S.ub V — deref(p, sizeof (xp), S.m) ) 


load returns a pair with the loaded value and the updated state, where ub 
is further constrained to ensure that pointer p is dereferenceable for at least the 
size of the loaded type. Similarly, store returns the updated state. The gray 
boxes ( --- ) encode SMT expressions; we describe these in the next section. 


1 We use the syntax of C for many of the examples in this paper to make them easier 
to read, even though we consider the semantics of LLVM IR. 
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Table 1. States and axioms after executing each of the lines of f. 


# Inputs: p, mo, ubo #|Inputs: p', mh, ubh 
2 S1 := (mo, ubo) Ai := q is fresh 2! |- 
3 | Sp := store q 42 S1 3|- 

S3 := (Mg, S2.ub V ub, 
j Ag:= r is not eer ae i Mg hen with S2.m on q r |91 = (mg, ubo V ubg) 
5 | S4:= store r 37 S3 5’ |S} := store r’ 37 S} 
6 O:= load q S4 6’ |O" := (42, S3) 


1. Encoding the output states. Table 1 shows the state after executing each of 
the programs’ lines. p, mo, and ubo are SMT variables for the input pointer, and 
function f caller’s memory and UB flag, respectively. The target’s corresponding 
variables are primed. Meta variables are upper-cased and SMT variables are 
lower-cased. 

On line 2, q is assigned a pointer to a new object (encoded in axiom A1). On 
line 3, ‘*q = 42’ updates the state using store. 

On line 4, the return value, output memory, and UB of g(p+1) are repre- 
sented with fresh variables r, mg, and ubg, respectively. Axiom A» encodes the 
provenance rules: the return value cannot alias with locally non-escaped point- 
ers (q) and only the remaining objects are modified. Line 4’ does not need these 
axioms because there are no locally-allocated objects in the target function. 

Finally, the outputs O and O’ are a pair of return value and state. 


2. Relating the source and target’s states. To prove correctness of a transforma- 
tion, we must first establish refinement between the input states of the source /- 
target functions. Refinement (3) is used rather than equality because it is allowed 
for the source’s caller to give less defined inputs than the target’s. 


Ain = (pop) ^A mom A (ubp => ubo) 


The inputs and outputs of function calls are also related using refinement. 
For any pair of calls in the source and target functions, if the target’s inputs 
refine those of the source, the target’s output also refines the source’s output. 
The example only has one function call pair: 


Acan = ( Sem 2 m Nae mgm, Arar’ A (ub, => ubg) 


We can now state the correctness theorem for the example transformation. 
For any input, if the axioms hold, the output of the target must refine that of 
the source for some internal nondeterminism in the source (e.g., the address of 
pointer q). Output is refined iff (i) the source triggers UB, or (ii) the target 
triggers no UB, and the target’s return value and memory refine those of the 
source. 
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Vp, p', Mo, Mo, ubo, ubo, Mg, Mg, ubg, ubg. Iq. (A1 A A2 A Ain A Acai) => O 2 O' 


2.2 Efficiently Encoding LLVM’s Memory Model and Refinement 


We now present our key ideas for efficiently encoding LLVM’s memory model 
and refinement (the gray boxes) in SMT, which is one of our main contributions. 


1. Pointers. We represent a pointer as a pair (bid,o) of a block id (ie., its 
provenance) and an offset within, so that we can easily detect out-of-bound 
accesses: accessing (bid, o) in memory m triggers UB unless 0 < o < m|[bid].size, 
from which deref((bid,0),sz,m) naturally follows. 


2. Bounding the number of blocks. Our first observation is that we can safely 
bound the number of memory blocks for bounded translation validation since 
loops are unrolled for a fixed number of iterations. As a result, we can use a 
(fixed-length) bit-vector to encode block ids. 

For the example source function, four blocks are sufficient: three for pointers 
p, q, r as they may all point to different blocks, and an extra to represent all the 
other blocks that are not syntactically present but are accessible by function g. 

For the sake of simplifying the example, we ignore that p, g, r may be null. 
Our model does not make such assumption; we explain later how null is handled. 


8. Aliasing rules. Several of the aliasing rules are encoded for free as we can 
distinguish most blocks by construction. First, we use the most significant bit of 
the block ids to distinguish local (1) from non-local (0) blocks. Second, we assign 
constant ids whenever possible (e.g., global variables and stack allocations). 

For the example source function, (without loss of generality) we set the block 
ids of q, p and the extra block to 1002), 000(2), and 0112) (in binary format), 
respectively. However, we cannot fix the block id of r and instead give the con- 
straint that it should be either 000/2) or 001(2) since r may alias with p but not 
with q. This establishes the alias constraints in A, and Ag for free. 


4. Memory accesses. In order to leverage the fact that each pointer may range 
over a small number of blocks as seen above, we use one SMT array per block 
(from an offset to a byte) instead of using a single global array (from a pointer 
to a byte). For the latter, it becomes harder to exploit non-aliasing guarantees 
since all stores to different blocks are grouped together. 


f , 100 000 
For the example source function, mo consists of four arrays m$ 3 mi ) 


mOn mow 


for the four blocks. Then since q’s block id is 100(2), store q 42 Sı 


at line 3 only updates the array mE, leaving the others unchanged. Similarly, 


store r 2 S3 at line 5 only updates m0) and mov using the SMT if-then-else 
expression on r’s block id. Finally, load q S4 at line 6 reads from the updated 


array at 100(2), thereby easily realizing that the read value is 42. 
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5. Refinement. The value/memory refinement 3 is defined based on a mapping 
between source and target blocks, which we efficiently encode leveraging the 
alignment information between source and target as much as possible (Sect. 7). 


3 LLVM’s Memory Model 


In this section, we give a brief introduction to LLVM’s memory model [24]. In 
this paper we only consider logical pointers (i.e., integer-to-pointer casts are not 
supported) and a single address space. 


Memory Block. A memory block is the unit of memory allocation: each stack or 
global variable has a distinct block, and heap allocation functions like malloc 
create a fresh block each time they are called. Each block is uniquely identified 
with a non-negative integer (bid), and has associated properties, including size, 
alignment, whether it can be written to, whether it is alive, allocation type (heap, 
stack, global), physical address, and value. 


Pointer. A pointer is defined as a triple (bid, off, attrs), where off is an offset 
within the block bid, and attrs is a set of attributes that constrain dereference- 
ability and which operations are allowed. 

Pointer arithmetic operations (gep) only change the offset, with bid and attrs 
being carried over. Unlike C, an offset is allowed to go out-of-bounds (OOB). Such 
pointer, however, cannot be dereferenced like in C (triggers undefined behavior— 
UB), but can be used for pointer comparisons for example. 

LLVM supports several pointer attributes. For example, a readonly pointer 
p cannot be used to store data. However, it is possible to use a non-readonly 
pointer q to store data to the same location as p (provided the block is writable). 
A nocapture pointer cannot escape from a function. For example, when a func- 
tion returns, no global variable may have a nocapture pointer stored (otherwise 
it is UB). 

LLVM has three constant pointers. The null pointer is defined as (0,0, 9). 
Block 0 is defined as zero sized and not alive. The undef? pointer is defined 
as (G,6,0), with 8,6 being fresh variables for each observation of the pointer. 
There is also a poison? pointer. 


Instructions. We consider the following LLVM memory-related instructions: 


— Memory access: load, store 
— Memory allocation: malloc, calloc, realloc, alloca (stack allocation) 
— Lifetime: start lifetime (for stack blocks), free (stack/heap deallocation) 


? In LLVM, undef values are arbitrary values of a given type with the additional 
property that they can yield a different value each time they are observed. undef 
values can be replaced with any value of the same type, except poison values. 

3 A poison value taints whole expression trees (e.g., poison + 1 = poison), and 
branching on it is UB. Similarly, dereferencing a poison pointer is UB. 
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— Pointer-related: gep (pointer arithmetic), icmp (pointer comparison) 
— Library functions: memcpy, memset, memcmp, strlen 
— Others: ptrtoint (pointer-to-integer cast), call (function call). 


Unsupported memory instructions are: integer-to-pointer casts, and atomic 
and volatile memory accesses. 


4 Encoding Memory Blocks and Pointers in SMT 


We describe our new encoding of LLVM’s memory model in SMT over the next 
few sections. We use the theories of UFs (uninterpreted functions), BVs (bit- 
vectors), and arrays with lambdas [7], with first order quantification. Moreover, 
we consider that the scope of verification is a single function without loops (or 
where loops have been previously unrolled). 


4.1 Memory Blocks 


Each memory block is assigned a distinct identifier (a bit-vector number). We 
further split memory blocks into local and non-local. Local blocks are all those 
that are allocated within the function under consideration, either on the stack 
or the heap. Non-local blocks are the remaining ones, including global variables, 
heap /stack allocations in callers and heap allocations in callees (stack allocations 
in callees are not observable, since they are deallocated when the called function 
returns, hence there is no need to consider them). 

We use the most significant bit (MSB) to encode whether a block is local (1) 
or non-local (0). This representation allows the null block to have bid = 0 and 
be non-local. We refer to the short block id, or bid, to refer to bid without the 
MSB. This is used in cases where it has already been established whether the 
block is local or not. Example with 4-bit block ids: 


int g; // bid(g) = 0001 

void f(int *p) { // bid(p) = Oxyz (with xyz = arbitrary) 
int a[2]; // bid(a) = 1000 
int *q = malloc(4); // bid(q) = 1001 

} 


The separation of local and non-local block ids is an efficient way to encode 
the constraint that pointers of these groups cannot alias with each other. In the 
example above, argument p cannot alias with either a or q. 

As we only consider functions without loops, block ids can be statically 
assigned for each allocation site. 


4.2 Pointers 


A pointer ptr = (bid, off, attrs) is encoded as a single bit-vector consisting in 
the concatenation of the three elements. The offset is interpreted as a signed 
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number (which is why blocks cannot be larger than half of the address space). 
Each attribute (such as readonly) is encoded with a bit. Example with 2-bit 
block ids and offsets, and a single attribute (we use . to visually separate the 
elements): 


void f(char readonly *p, char *q) { // p = Ox.ab.1, q = Oy.cd.0 


char *r = p + 2; // r = Ox. (ab+2).1 
char *s = q + 3; // s = Oy. (cd+3).0 
char *t = malloc(4); // t = 10.00.0 


} 


Let off be a truncated offset where the least significant bits corresponding to 
the greatest common divisor of the alignment and sizes of all memory operations 
are removed. For example, if all operations are 4-byte aligned and they access 
either 4- or 8-byte values, then off has less 2 bits than off (as these are guaranteed 
to be always zero when accessing the memory). 


4.3 Block Properties 


Each block has seven associated properties: size, alignment, read-only, liveness, 
allocation type (heap, stack, global), physical address, and value. Block proper- 
ties are looked up and updated by memory operations. For example, when doing 
a store, we need to check if the access is within the bounds of the block. 

Except for liveness and value, properties are fixed at allocation time. Liveness 
is encoded with a bit-vector (one bit per block), and value with arrays (indexed 
on off). We use a multi-memory encoding, where we have one array per bid. 

The encoding of fixed properties differs for local and non-local blocks. For 
non-local blocks, we use a UF symbol per property, taking bid as argument. 
For local blocks, we cannot use UFs because for the refinement check some of 
these would have to be quantified (c.f. Sect. 7) and most, if not all, SMT solvers 
do not support quantification of UF symbols. Therefore, we encode each of the 
remaining properties of local blocks as an if-then-else (ITE) expression, which is 
tailored for each use (e.g., each time an operation needs to lookup a local block’s 
size, we build an ITE expression for the given bid). 

Using ITE expressions to encode properties is less concise than using UFs. 
However, it is not a disaster for two reasons. Firstly, we only need to consider 
the local blocks that have been allocated beforehand, since the program cannot 
access blocks allocated afterward. Secondly, pointers are usually not fully arbi- 
trary. Oftentimes we know statically which type of block they refer to, and even 
what is the block id, given that pointer arithmetic operations do not change the 
block id. Therefore, the ITE expressions are usually small in practice. Example 
with 4-bit block ids and offsets of a source program: 


int g; // g = 0001.0000, size_src(001) = 4 
void f() { 
char p[2]; // p = 1000.0000 


char q[3]; // q = 1001.0000 
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char *r =... porqorg... 

r[2] = 0; 

char t[1]; // t = 1010.0000 
} 


The store in this program is only well defined if the size of block pointed by 
r is greater than 2. This is encoded in SMT as follows: 


ite(islocal(r), ite(bid(r) = 0, 2, 3), sizesre(bid(r))) > 2 


Function islocal(p) is encoded with the SMT extract expression to fetch the 
MSB of the pointer. Similarly, bid(p) extracts the relevant bits from a pointer. 
The expression for local blocks only needs to consider local blocks 0 and 1, since 
block 2 (t) is only allocated afterward. This allows a simple single pass through 
the code to generate optimized ITE expressions. 


Value. Value is defined as an array from short offset to byte (described later 
in Sect. 6.1). For non-local blocks, only those that are constant are initialized 
with the respective value. The remaining blocks are allowed to take almost any 
value. The exception is for pointers: non-local blocks cannot initially have local 
pointers stored, since the calling environment cannot fabricate local pointers. 

Local blocks are initialized with poison values using a constant array (i.e., 
an array that yields the same value for all indexes). 


4.4 Physical Addresses 


If a program observes addresses (through, e.g., pointer-to-integer casting), we 
need additional constraints to ensure that addresses of blocks that overlap in 
time are disjoint. Since we are doing translation validation, we have two programs 
with potentially different sets of locally allocated blocks. Therefore, we need to 
ensure that non-local blocks’ addresses are disjoint from those of local blocks of 
both programs. This makes the disjointness constraints quite complex. 

As an optimization, we split the address space in two: local blocks have 
MSB = 1 and non-locals have MSB = 0. Since the encoding of address disjointness 
is quadratic in the worst case (cross-product of blocks), halving the number of 
blocks is significant. This optimization, however, is an under-approximation of 
the program’s behavior (Sect. 9). After investigating LLVM’s optimizations, we 
believe it is highly unlikely this approximation will cause false negatives. 

If a program does not observe any pointer’s physical address, neither the 
block’s physical address property nor the disjointness axioms are instantiated. 
However, when dereferencing a pointer, we need to check if the physical address 
is sufficiently aligned. When physical addresses are not created, we resort to 
checking alignment of both of the pointer’s block and offset. Since in this case 
physical addresses are not observed (and therefore not constrained by the pro- 
gram using, e.g., pointer comparisons), a block’s physical address can take any 
value, and therefore blocks and offsets must be both sufficiently aligned to ensure 
that physical pointers are aligned in all program executions. This argument jus- 
tifies why we can soundly discard physical addresses. 
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Table 2. Comparison of two semantics for pointer comparison. 


Integer comparison | Non-deterministic 
Fold p = q to false if p.bid 4 q.bid No Yes 
Fold p+i=q+t+itop=q Yes No 
Fold (int)p = (int)q to p=q Yes No 
Fold p<qAp#qtop<q Yes No 
Fold p<qAq#nulltop<q Yes Potentially 
Run-time aliasing checks Yes Correct, but not useful 
Analysis of pointers cast from integers | Harder Easy 


4.5 Pointer Comparison 


Given two pointers p and q, if a program learns that q is placed right after 
p in memory, the program can potentially change the contents of q without 
the compiler realizing it. Detecting the existence of such code is impossible in 
general, hence restricting the ways a program can learn the layout of objects in 
memory is important to make pointer analyses fast yet precise. 

A way the memory layout can leak is through pointer comparison. For exam- 
ple, what should p < q return if these point to different memory blocks? If it is a 
well-defined operation (i.e., simply compares their integer values), it leaks mem- 
ory layout information. An alternative is to return a non-deterministic value to 
prevent layout leaks, the formal semantics of which is defined at [24]. 

We found that there are pros and cons of both semantics for the comparison of 
pointers of different blocks, and that neither of them covers all optimizations that 
LLVM performs. Table 2 summarizes the effects on each of the optimizations. 

We decided to implement the integer comparison semantics, as LLVM per- 
forms all the optimizations above and its alias analyses (AA) mostly give up 
when they encounter an integer-to-pointer cast. In summary, we have to remove 
the first optimization from LLVM to make it sound. Additionally, we make it 
harder to improve LLVM’s AA algorithms w.r.t. to pointers cast from integers. 


4.6 Bounding the Maximum Number of Blocks 


Since we assume that programs do not have loops, we can statically bound the 
maximum number of both local and non-local blocks a program may observe. 

The maximum number of local blocks in the source and target programs, 
respectively, NEXS ı and N, ane is computed by counting the number of heap and 
stack allocation instructions. Note that this is an upper-bound because not all 
allocation sites may be reachable in practice. 

For non-local blocks, we cannot see their definitions as with local blocks, 
except for global variables. Nevertheless, we can still bound the maximum num- 
ber of observed blocks. It is sufficient to count the number of instructions that 
may return non-local pointers, such as function calls and pointer loads. In addi- 
tion, we consider a null block when needed (if the null pointer may be observed). 
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To encode the behavior of source and target programs, we need Nir + 
N ee non-local blocks in the worst case, as all referenced pointers may be dis- 
tinct. However, correct transformations will not have the target program observe 
more blocks than the source. If the target observes a pointer to a non-local block 
that was not observed in the source, we can set that pointer to poison because 
its value is not restricted by the source. Therefore, N$?S oca} non-local blocks 
are sufficient to allow the target to exhibit an incorrect behavior. 

The bit-width of bid is: we, = [logy(maz(NS ocan max(NPr-), Ni) 
When only local or non-local pointers are used, wyid = Wg, as we know statically 


if the pointer is local or not. Otherwise, wyiq = Wr + 1. 


5 Memory Allocation 


In LLVM, memory blocks can be allocated on the stack (alloca), in the heap 
(e.g., malloc, calloc, etc.), or as global variables. It is surprisingly non-trivial to 
find a semantics for memory allocations that allows all of LLVM’s optimizations, 
and rejects undesired transformations. For example, we have to support alloca- 
tion removal and splitting, introduce new stack allocations and new constant 
global variables, etc. We explore multiple semantics and show their merits and 
shortcomings in the context of proving correctness of program transformations. 


5.1 Heap Allocation 


Heap allocation is done through functions such as malloc, calloc, C+-+’s new 
operator, etc. We describe semantics for malloc; remaining functions can be 
described in terms of it. 

First of all, it is important to note that there are two common idioms used 
in practice by C programmers when doing memory allocation: 


int *p = malloc(4); int *p = malloc(4); 
*p = 0; if (p) { *p = 0; } 


In some programs, like the example on the left, malloc is assumed to never 
return null, say non-null assumption. This is mainly because the program does 
not consume too much memory and it is expected that the computer has enough 
memory/swap space. In other programs like the one on the right, malloc is 
expected to sometimes return null, say may-null assumption. Therefore, the 
program performs null-ness checks. 

Since both programming styles are prevalent, we would like optimizations to 
be correct for both. This is non-trivial, as the two assumptions are conflicting: 
with the non-null assumption, it is sound to eliminate null checks, but not with 
the may-null assumption. We now explore several possible semantics to find one 
that works for both programming styles. 
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A. Malloc always succeeds. Based on the non-null assumption, in this seman- 
tics we only consider executions where there is enough space for all allocations 
to succeed. Regardless of whether the target uses more or less memory than 
the source, all calls to malloc yield non-null pointers. Therefore, for example, 
deleting unused malloc calls is allowed. 

However, removing null checks of malloc is also allowed in this semantics. 
For example, optimizing the right example above into the left one is sound. This 
transformation, however, is obviously undesirable. 


B. Malloc only succeeds if there is enough free space. To solve the problem just 
described, based on the may-null assumption, we can simulate the behavior of 
dynamic memory allocation and define malloc to return a pointer to a newly 
created block if there is an empty space in memory, and null otherwise. This 
semantics prevents the removal of null checks of malloc as it may return null. 
However, this semantics does not explain removal of unused allocations. It 
aligns both source and target programs’ allocations such that any change in the 
allocation sequence disrupts the program alignment and thus makes verification 
fail. For example, the following transformation removing unused malloc instruc- 
tions and replacing comparisons of their output with null is not supported: 


int *x = malloc(4); // remove x (unused) 
if (x != nullptr) { ... } z if (true) { ... } 


In case there were 0 bytes left in memory, x would be null, but since LLVM 
assumes that the program cannot observe the state of the allocator it folds 
the comparison x != nullptr to true after eliminating the allocation. This 
optimization would be flagged as incorrect in this semantics. 

LLVM assumes very little about the run-time behavior of memory allocators. 
This is to support, for instance, garbage collectors, where an allocation may fail 
but if repeated it may succeed because memory was reclaimed in between. This 
explains why LLVM folds comparisons with null of unused memory blocks, and 
also contradicts the linear view of allocations of this semantics. 


C. Malloc non-deterministically returns null. This semantics abstracts 
the behavior of the memory allocator by (1) allowing malloc to non- 
deterministically return null even if there is available space, and (2) only consid- 
ering executions where there is enough space for all allocations to succeed. This 
semantics prevents the removal of null checks of malloc, which fixes the short- 
comings of semantics A, and also allows the removal of unused allocations, which 
fixes those of semantics B. However, this semantics is too weak and therefore 
allows other undesirable transformations, like the following: 


p = malloc(4); , . 
*p = 0; => exit(; 

For the sake of proving refinement (Sect. 7), we need just one trace triggering 
UB (i.e., one particular realization of the non-deterministic choices) for a given 
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MSB LSB 
Pointer byte: Pointer representation Byte offset: 
Non-pointer byte: 0 Poison bits Integral value Padding 


Fig. 1. Bit-wise representation of a byte. A pointer byte is poison if ‘p?’ is zero. A 
non-pointer byte tracks poison bit-wise. 


input to be able to transform the source program into anything for that input. 
Informally speaking, refinement always picks the worst-case execution for each 
input. Since the source program executes UB when p is null, it is correct to 
transform the source into any program although that is obviously undesirable. 

This semantics is too weak in practice since many programs are written 
without null checks, either assuming the program will not run out of memory, 
or assuming the program will terminate if it runs out memory. It is not reasonable 
in practice to allow compilers to break all such programs. 


Our Solution. As we have seen, there is no single semantics that both allows all 
desired transformations and rejects undesired ones. While semantics B prevents 
desired optimizations like allocation removal, semantics A and C allow undesired 
optimizations, but in a complementary way. For example, removing null checks of 
malloc is allowed in A but not in C. On the other hand, transforming an access 
of a malloc-allocated block without a null check beforehand into arbitrary code 
is allowed in C but not in A. 

Therefore, we obtain a good semantics by requiring both A and C: an opti- 
mization is correct if it passes the refinement criteria with each of the two 
semantics. Intuitively, this definition requires the compiler to support the two 
considered coding styles: semantics A supports the non-null assumption, while 
semantics C the may-null assumption. 


5.2 Stack Allocation 


The semantics of alloca, the stack-allocation instruction, is slightly different 
from that of malloc. LLVM assumes that stack allocations always succeed, since 
the program will likely crash if there is a stack overflow. That is, alloca never 
returns a null pointer. 

LLVM performs more optimizations on stack allocations than on heap ones. 
For example, LLVM can split an allocation into multiple smaller ones or increase 
the alignment. These transformations can increase memory consumption. 


6 Encoding Loads and Stores in SMT 


We encode the value of memory blocks with several arrays (one per bid): from 
short offset to byte. We next give the definition of byte and the encoding of 
memory accessing instructions in SMT. 
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6.1 Byte 


There are two types of bytes: pointer bytes and non-pointer bytes, cf. Fig. 1. 

A pointer byte has the most significant bit (MSB) set to one. The following 
bit states whether the byte is poison or not. Next is the pointer representation 
as described in Sect. 4.2 (bid, off, attrs). 

Pointers are often longer than one byte, so when storing a pointer to memory 
we write multiple consecutive bytes. Each of these bytes records the same pointer, 
but with a different byte offset (the last bits of the byte) to distinguish between 
the partial bytes of the pointer. 

For non-pointer bytes, we track whether each of the bits is poison or not. 
This is not required for pointers, since LLVM does not allow pointer values to 
be manipulated bit-wise. Non-pointer values can be manipulated bit-wise (e.g., 
using vectors with element types smaller than 8 bits). Each bit of the integral 
value is only significant if the corresponding poison bit is zero. 


6.2 Load and Store Instructions 


Load and store instructions are trivially encoded using SMT arrays. These arrays 
store bytes as described in the previous section. We next describe how LLVM 
values are encoded to and decoded from our byte representation. 

We define two functions, ty} (v) and tyft(b), which convert a value v into a 
byte array and a byte array b back to value, respectively. We show below tyl (v) 
when v Æ poison. isz stands for the integer type with bit-width sz. If sz is not 
a multiple of 8 bits, v is zero-extended first. When v is poison, all poison bits 
are set to one. BitVec(n, b) stands for number n with bit-width b. Pointer’s byte 
offset is 3 bits because we assume 64-bit pointers. 


isz} (v) or float} (v) = Ai. 0 + 08 + bitrepr(v)[8xi...8x (i + 1) — 1] + padding 
tyx|-(v) = Ai. 1? + bitrepr(v) ++ BitVec(i, 3) 


isz{}(b) and float7}(b) return poison if any bit is poison, or if any of the 
bytes is a pointer. Otherwise, these functions return the concatenation of the 
integral values of the bytes. 

ty*{}(b) returns poison if any of the bytes is poison or not a pointer, there 
is more than one distinct pointer value in b, or one of the bytes has an incorrect 
byte offset (they have to be consecutive, from zero to byte size minus one). 
An exception is reading a non-pointer zero byte, which is interpreted as a null 
pointer byte. This allows initialization of, e.g., arrays with null pointers with 
memset (which is an idiom commonly used in LLVM IR). 


6.3 Multi-array Memory 


As already described, we use a multi-array encoding for memory, with one array 
per block id, each indexed on off. A simpler encoding would have used a single 
array indexed on ptr. The multi-array encoding is beneficial when we can cheaply 
compute small aliasing sets for each memory access. In that case, we reduce the 
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Num(sz) := {i | 0<i<2**} [BlockID ::= N |Addr ::= Num(64) | Offset ::= Num(64) 
PtrAttr ::= {nocapture, readonly, readnone} |Pointer ::= BlockID x Offset x2P*4t*™ 
Value := Aggregate wW Int W Pointer W Float W {poison} |Aggregate ::= list Value 


PtrByte ::= (Pointer x {2 | 0<i<8}) W {poison} |NonPtrByte ::= Num(8) xNum(8) 


Byte := PtrByte W NonPtrByte Bytes ::= Offset > Byte Size := Num(64) 
Align := {i |0<i<64} [Kind ::= {stack, malloc, new, global} 


Writable ::= bool |MemBlock ::= Addr x Align x Kind x Live x Writable x Size x Bytes 


Memory ::= BlockID — MemBlock]UB ::= bool}FinalState ::= Valuex Memory x UB 


ag E€ Aggregate pb € PtrByte |nb € NonPtrByte 
mb € MemBlock ub € UB |u € BlockID + BlockID 


Fig. 2. Type definitions and variable naming conventions. 


(VALUE-POISON) (VALUE-NONPTR) (VALUE-PTR) (VALUE-AGGREGATE) 


v € Value v€IntWFloat pit. p |ag| = |ag’| Vi, agli] 3” ag’ [i] 
poison J“ v vI“ v pa p ag I" aq’ 
(FINAL-STATE-UB) FINAL-STATE 
ub = ub’ Ju, v I“ v AM Ihem M’ 
(v, M, true) Isi (v, M’, ub’) (v, M, ub) Is (v', M', ub’) 


Fig. 3. Refinement of value and final state. 


case-splitting work on bid that the SMT solver needs to do, and it enables further 
formula simplifications like store forwarding. 

The multi-array encoding may, however, end up in a larger encoding overall if 
several of the accesses may alias with too many blocks. For load operations that 
alias multiple blocks the resulting expression is a linear combination of the loads 
of each block, e.g., ite(bid = 0,load(moọ, off), ite(bid = 1,load(m1,off),...)). 
In this case, it would be more compact to use the single-array encoding. Note 
that even if we do not know the specific block id, we often know whether a 
pointer refers to a local or non-local block (e.g., pointers received as argument 
have unknown block id, but are known to be non-local), and hence splitting the 
memory in two is usually a good idea (c.f. Sect. 10). 

We perform several optimizations that are enabled with this multi-array 
encoding. We do partial-order reduction (POR) to shrink the potential alias- 
ing of pointers with unknown block id. For example, consider a function with 
two pointer arguments (x and y) and one global variable. We assign bid = 1 to 
the global variable. Then, we estipulate that x can only alias blocks with bid < 2, 
which is sufficient to access the global variable or another unknown block. Argu- 
ment y is also constrained to only alias blocks with bid < 3, allowing it to alias 
with the global variable, the same block as x, or a different block. The same is 
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(POINTER) (MEMORY-MAP) 
p.block.live > p’ .block.live Ybid, isNonLocal(bid) 
p.offset = p’ offset => Mbid] 3# M’ [bid] 
(isNonLocal({p, p'} A p.block.id = p'.block.id) Vbid, isLocal(bid) A [bid] defined 
V (isLocal({p, p'} A p.block.id = p[p’ block.id}) => M[u[bid]] 3# M’ [bid] 
P Iber p M Iro M' 
(BYTE-PTR) (BYTE-NONPTR) (BYTE-ZERO) BYTE-POISON) 
pb.byteoff = pb’ .byteoff nb'.p | nb.p = nb.p isZeroByte(b) 
pb.ptr Ibs, pb’ .ptr nb.v | nb.p = nb'.v | nb.p  isZeroByte(b’) isPoisonByte(b) 
pb Iiyte po’ nb mre nb! b siete b' b mi v 
(BYTES) (BLOCK) 
mb.live = mb’ live mb.size = mb’ size 
VO < i < mbsize, mb.kind = mb'.kind mb.writable = mb’ writable 
mb.bytes[i] 3f se mb’ .bytes{i] mb.align < mb’ align mb.live > mb Dhyieg mb’ 
mb T gigs mob’ mb Jt, mb’ 


Fig. 4. Refinement of memory and pointers. 


done for function calls that return pointers. This POR technique greatly reduces 
the potential aliasing of unknown pointers without losing precision. 


7 Verifying Correctness of Optimizations 


To verify correctness of LLVM optimizations, we establish a refinement relation 
between source (or original) and target (or optimized) functions. Equivalence is 
not used due to undefined behavior and nondeterminism. Compilers are allowed 
to reduce the set of possible behaviors from the source. 

Given functions fsrc and figt, set of input and output variables Tsre/Ttgt and 
O (which include, e.g., memory and the return value), and set of non-determinism 
variables Nere/Nigt, fore is refined by fige iff: 


VI sre, Lege; Orgt x valid(Isre, Ligt) A I sre J Ligt A ANsre : presale, Nere) A^ 
(ANige -pretet (Lege, Negt) A [fegel igt, Negt) = Ortgt) 
=> (ANsre . presre(Isre; Nore) A [fsre (sre, Nore) Sst Ortgt) 


Predicate valid(I src, Itgt) encodes the global precondition of the input mem- 
ory and arguments such as disjointness of non-local blocks. Function’s precon- 
ditions, Presrc and pretgt, include the constraint for disjointness of local blocks. 
The existential Presre constrains the input such that the source function has at 
least one possible execution. Ist is the refinement between final states. 

Figure 2 shows the definition of final program state which is a tuple of return 
value, return memory, and UB. A memory is a function from block id to a mem- 
ory block. A memory block has seven attributes that are described in Sect. 4.3. 
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Figure 3 shows the definition of refinement of value and final state. For point- 
ers, we cannot simply use equality because local pointers in source and target are 
internal to each of the functions. Even if they have the same block identifier, they 
may refer to different allocation sites in the functions (VALUE-PTR). Similarly, 
the refinement of the final state should consider this difference between local 
pointers. To address this, we track a mapping u between escaped local blocks of 
the two functions (described next). 


7.1 Refinement of Memory 


Checking refinement of non-local memory blocks is simple as blocks are the same 
in the source and target functions (e.g., global variables have the same ids in the 
two functions). Therefore, one just needs to compare blocks of source and target 
functions with the same id pairwise. 

Checking refinement of local blocks is harder but needed when, e.g., the 
function returns a locally-allocated heap block. This is legal, but block ids in the 
two functions may not be equal as allocations may have happened in a different 
order. Therefore, we cannot simply compare local blocks with the same ids. 

To check refinement of local blocks, we need to align the two functions’ 
allocations, i.e., we need to find a correspondence between local blocks of the 
two functions. We introduce a mapping u € BlockID + BlockID between target 
and source local block ids. 

Local blocks become related on function calls and return statements, which 
is when local pointers may be observed. For example, if a function is called with 
a pointer to a local block as the first argument, u should relate that pointer with 
the first argument of an equivalent function call in the target function. 

Figure4 gives the definition of memory refinement, M I... M’, as well 
as other related relations between memory blocks and pointers. The first rule 
POINTER. describes refinement between source pointer p and target pointer p’ 
with respect to u. The following four rules define refinement between bytes b and 
b’. In rule BYTE-NONPTR, ‘a|b’ is the bitwise OR operation, and it is used to 
check the equality of only those bits that are not poison. Predicate isZeroByte(b) 
holds if b is a null pointer or if it is a zero-valued non-pointer byte. This is needed 
because stores of null pointers can be optimized to memset instructions. 

Rules BYTES and BLOCK define refinement between memory blocks’ values 
and memory blocks, respectively. Rule MEMORY-MAP describes memory refine- 
ment with respect to local block mapping u. M [bid] stands for the memory block 
with block id bid. 

The well-formedness of u is established in the refinement rules for function 
calls and return statements. We show these for function calls in the next section. 
We note that there might be multiple well-formed u due to non-determinism. 
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(NONPTR (PTR (PTR-ARG-UNMAPPED) | (PTR-ARG-BYVAL) 
-ARG) -ARG sz>0 o=p.offset o' = p’.offset 
v,v ¢ -MAPPED) isLocal({p, p'}) mb = M[p.bid] mb’ = M'[p’ bid] 
Pointer p.offset = p’.offset VO < i < sz, 
vI" v pat. p" = M[p.bid] Df), M'[pbid] |mb.bytes[o+i] 3%, mb'.bytes[o+ i] 
aH,sz I sH,sz 7 =H,SZ 7 sh, az 1 
U Sarg v P arg Pp Pp =arg Pp p =arg P 


Fig. 5. Refinement between function arguments. 


8 Function Calls 


A call to an unknown function may change the memory arbitrarily (except for, 
e.g., constant variables and non-escaped local blocks). The outputs in the source 
and target are, however, related: if the target’s inputs refine those of the source, 
refinement holds between their outputs as well. Alive2 already supported func- 
tion calls; this section shows how it was extended to support memory. 

Let (Min, vin) and (Mout, Vout) be the input and output of a function call 
in the source, and their primed versions, (Mj,,,v/,,) and (M}ut, Vhut), those of a 
function call in the target. Let Hin be a local block mapping before executing 
the calls. To state that the outputs are refined if the inputs are refined, we add 


the following formula to the target’s precondition: 


=i 1 -m [i] Hin 82[4] pe FHout yy! Hout y! 
(Min = mem Min A^ Vi. Vin [i] Harg Vin [i] = (Mout =mem Mout A Vout =e" Uhut) 


A call to a function with a pointer to a local block as argument escapes this 
block, as the callee may, e.g., store that pointer to a global variable. Moreover, 
any pointer stored in this block also escapes as the callee may traverse the block 
and grab any pointer stored there, and do so transitively. The updated mapping 
Hout = extend (tin, Min, Min, Vin, Vin) returns Hin updated with the relationship 
between the newly escaped blocks in source and target functions. 

Figure 5 shows the definition of refinement between function call arguments 
in source and target programs. The first rule relates non-pointer arguments. 
The second one handles pointers that have escaped before these calls. The third 
rule handles local pointers of blocks that did not escape before these calls, and 
therefore we need to check if the contents of these block are refined. 

The fourth refinement rule handles byval pointer arguments. These argu- 
ments get a freshly allocated block and the contents of the pointer are copied 
from the pointer’s offset onwards. 


9 Approximating Program Behavior 


In order to speedup verification, we approximate programs’ behaviors, which can 
result in false positives and false negatives. We believe none of these approxima- 
tions has a significant impact for two reasons: (1) we only need to be as precise as 
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LLVM’s static analyses, i.e., we do not need to support arbitrary optimizations, 
and (2) we do not consider the compiler to be malicious (which may not be true 
in certain contexts). Moreover, we conducted an extensive evaluation to support 
these claims, on which we report in the next section. 


Under-Approximations 


1. Physical addresses of local memory blocks have the MSB set to 1, and non- 
locals set to 0. This is reasonable if we assume the compiler is not malicious 
and therefore will not exploit our approximation. 

2. We do not consider the case where a (portion of a) global variable is initially 
undef, only poison or a regular value. 

3. Library functions strlen, memcmp, and bcmp are unrolled for a constant 
number of times. A precondition is added to constrain the input to be smaller 
than the unroll factor. In the case of strlen, the input pointer is often a 
constant array. We compute the result straight away in this case. 


Over-Approximations. The set of local blocks that escape (e.g., whose address 
is stored into a global variable) is computed per function. This may over- 
approximate the set of escaped pointers at times because, e.g., a pointer may 
only escape in a particular branch. LLVM also computes the set of escaped 
pointers per function. 


10 Evaluation 


We implemented our new memory model in Alive2 [30]. The implementation of 
the memory model consists in about 3.0 KLoC plus an additional 0.4 KLoC for 
static analyses for optimization. 

We run two set of experiments to both validate our implementation and 
the formal semantics, and to identify bugs in LLVM. First, we did translation 
validation of LLVM’s unit tests (test/Transforms) to increase confidence that 
we match LLVM’s behavior in practice. Second, we run five benchmarks: bzip2, 
gzip, oggenc, ph7, and SQLite3. 

Benchmarks were compiled with -03. Moreover, we disabled type-based alias- 
ing because there is no formal model for this feature yet. During compilation, we 
emitted pairs of IR files before and after each intra-procedural optimization. We 
discarded syntactically equal pairs as well as pairs without memory operations. 

We used a machine with two Intel Xeon E5-2630 v2 CPUs (total of 12 cores). 
We set Z3’s timeout to 1 min and memory limit to 1 GB. Loops were unrolled 
once. We used LLVM from 11/Dec (5e31e22) and Z3 [33] from 16/Dec (11477f). 


10.1 LLVM Unit Tests 


LLVM’s Transforms unit test suite consists in 6,600 tests totaling 36,600 func- 
tions. Alive2 takes about 2.5 h (in parallel) to validate these. By running LLVM’s 
unit tests, we found 21 new bugs in memory optimizations. 
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Table 3. Statistics and results for the single-file benchmarks. 


Program | LoC | Pairs Time (hours) | Correct | Incorrect |TO |OQOM| Unsupported 
pairs 

bzip2 5.1k |2.3k | 1.9 316 9 574 |175 |1.2k 

gzip 5.3k |2.6k | 2.0 908 4 922 |45 737 

oggenc |48k |1.8k | 2.0 433 5 617 |49 701 

ph7 43k |5.6k (3.4 1.2K 23 1.5K | 15 2.8k 

sqlite3 |141k|12k (7.5 2.2k 38 2.2K |48 7.8k 


We show below an example of a bug we found. This optimization was shrink- 
ing the store from 64 to 32 bits, which is incorrect since the last 32 bits were not 
copied. This happened because of the mismatch in the load/store’s sizes. 


// i32 *x, *y, *Z; // i32 *x, *y, *Z; 
i32 *p = (*x < *y ? x : y); A i32 r = (*x < *y ? *x : *y); 
*(i64*)z = *(i64*)p; ¥Z = T; 


10.2 Benchmarks 


Table3 shows the statistics and results for translation validation. The Pairs 
column indicates the number of source/optimized function pairs considered for 
validation. We discarded pairs where the two functions were syntactically equal, 
as the transformation is then trivially correct. The last column indicates the 
number of skipped pairs because they use features Alive2 does not yet support. 

All the 79 incorrect pairs are due to mismatches between LLVM and the 
formal semantics. Of these, 74 are related with incorrect handling of undef and 
poison values, and the remaining 5 are caused by incorrect load type punning 
optimizations. This shows that our tool has no false positives. 


10.3 Specification Bugs 


While testing our tool, we found a mismatch in the semantics of the nonnull 
attribute between LLVM’s documentation and LLVM’s code. The documenta- 
tion specified that passing a null pointer to a nonnull argument triggered UB. 
However, as illustrated below, LLVM adds nonnull to a pointer that may be 
poison. This is incorrect because poison can be optimized into any value includ- 
ing null. 


p = gep inbounds q, 1 p = gep inbounds q, 1 
f (p) = f(monnull p) ; UB if p poison 
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We proposed a new semantics to the LLVM developers, where non- 
conforming pointers would be considered poison rather than UB. This was 
accepted and we have contributed patches to fix the docs and the incorrect 
optimizations. 


10.4 Alias Sets 


To show that splitting the memory into multiple arrays is beneficial, we gathered 
statistics of the alias sets in our benchmarks. More than 96% of the dereferenced 
pointers turned out to be only local or non-local, but not both. This shows that 
splitting the memory into local and non-local simplifies the memory encoding. 
We also counted the number of memory blocks pointers may alias with. Half 
of the pointers were aliased with just one block. About 80% of the pointers 
aliased with at most 3 blocks. This is much less than the median number of 
blocks functions have. The median of the number of memory blocks was 7 ~ 13 
(varying over programs), and only 10% of the functions had fewer than 3 blocks. 


11 Related Work 


Semantics of LLVM IR. The official LLVM IR’s specification is written in 
prose [1]. Vellvm [47] and K-LLVM [29] formalized large subsets of the IR in 
Coq and K, respectively. [26] clarifies the semantics of undef and poison and 
proposes a new freeze instruction. [24] formalizes various memory instructions 
of LLVM. [32] presents a C memory model that supports compilation to that 
LLVM model. 


Translation validation. [38] presents a translation validation infrastructure for 
GCC’s intermediate language, using a set of arithmetic/aliasing rules for show- 
ing equivalence. LLVM-MD [44] and Peggy [42] verify LLVM optimizations by 
showing equivalence of source and targets with rewrite rules/equality axioms. 
They suffer, however, from incomplete axioms for aliasing. 

In order to simplify the work of translation validation tools, it is possible 
to extend the compiler to produce hints (witnesses) [18,36,38,41]. One of these 
tools, Crellvm [20], is formally verified in Coq. 


Verifying programs with memory using SMT solvers. SMT solvers have been 
used before to check equivalence of programs with memory [11,14,21,25,31]. [12] 
give an encoding of some (but not all) aliasing constraints needed to do transla- 
tion validation of assembly generated by C compilers. 

Other memory models encoded in SMT include one for Solidity (Etherium 
smart contracts) [16], and for separation logic [37,39]. Several verification tools 
include SAT/SMT-based (partial) memory models for C [2,9,10] and Java [43]. 

Several automatic software verification tools, often based on CHCs (con- 
strained Horn clauses), support memory programs [6, 13]. For example, both Sea- 
Horn and Cascade use a field-sensitive alias analysis to split the memory [15,45]. 
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SLAYER [4] is an automatic tool for analyzing memory safety of a C program 
using Z3. Smallfoot [3] verifies assertions written in separation logic. 

There have been recent advances in speeding up verification of (SMT) array 
programs [17,22], from which we could likely benefit. 

CompCert [27] splits the memory into local (private) and non-local (public) 
blocks, similarly to what we do, but assumes that allocations never fail [28]. Work 
on verifying peephole optimizations for CompCert does not support memory [34]. 

To support integer-to-pointer casts in CompCert, [5] proposes extending inte- 
ger values to carry block ids as well. In this model, arithmetic on pointer values 
yields a symbolic expression. [19] makes the pointer-to-integer cast an instruction 
that assigns a physical address to the block. Neither of these models supports 
several optimizations performed by LLVM. 


12 Conclusion 


We presented the first SMT encoding of LLVM’s memory model that is suffi- 
ciently precise to validate all of LLVM’s intra-procedural memory optimizations. 

Using our new encoding, we found and reported 21 previously unknown bugs 
in LLVM memory optimizations, 10 of which have already been fixed. 
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Abstract. In recent years, there has been significant progress in the 
development and industrial adoption of static analyzers, specifically of 
abstract interpreters. Such analyzers typically provide a large, if not 
huge, number of configurable options controlling the analysis precision 
and performance. A major hurdle in integrating them in the software- 
development life cycle is tuning their options to custom usage scenarios, 
such as a particular code base or certain resource constraints. 

In this paper, we propose a technique that automatically tailors an 
abstract interpreter to the code under analysis and any given resource 
constraints. We implement this technique in a framework, TAILOR, which 
we use to perform an extensive evaluation on real-world benchmarks. 
Our experiments show that the configurations generated by TAILOR are 
vastly better than the default analysis options, vary significantly depend- 
ing on the code under analysis, and most remain tailored to several sub- 
sequent code versions. 


1 Introduction 


Static analysis inspects code, without running it, in order to prove properties or 
detect bugs. Typically, static analysis approximates code behavior, for instance, 
because checking the correctness of most properties is undecidable. Performance 
is another important reason for this approximation. Typically, the closer the 
approximation is to the actual code behavior, the less efficient and the more 
precise the analysis is, that is, the fewer false positives it reports. For less tight 
approximations, the analysis tends to become more efficient but less precise. 
Recent years have seen tremendous progress in the development and indus- 
trial adoption of static analyzers. Notable successes include Facebook’s Infer [7,8] 
and AbsInt’s Astrée [5]. Many popular analyzers, such as these, are based on 
abstract interpretation [12], a technique that abstracts the concrete program 
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semantics and reasons about its abstraction. In particular, program states are 
abstracted as elements of abstract domains. Most abstract interpreters offer a 
wide range of abstract domains that impact the precision and performance of 
the analysis. For instance, the Intervals domain [11] is typically faster but less 
precise than Polyhedra [16], which captures linear inequalities among variables. 

In addition to the domains, abstract interpreters usually provide a large 
number of other options, for instance, whether backward analysis should be 
enabled or how quickly a fixpoint should be reached. In fact, the sheer number of 
option combinations (over 6M in our experiments) is bound to overwhelm users, 
especially non-expert ones. To make matters worse, the best option combinations 
may vary significantly depending on the code under analysis and the resources, 
such as time or memory, that users are willing to spend. 

In light of this, we suspect that most users resort to using the default options 
that the analysis designer pre-selected for them. However, these are definitely 
not suitable for all code. Moreover, they do not adjust to different stages of 
software development, e.g., running the analysis in the editor should be much 
faster than running it in a continuous integration (CI) pipeline, which in turn 
should be much faster than running it prior to a major release. The alternative of 
enabling the (in theory) most precise analysis can be even worse, since in practice 
it often runs out of time or memory as we show in our experiments. As a result, 
the widespread adoption of abstract interpreters is severely hindered, which is 
unfortunate since they constitute an important class of practical analyzers. 


Our Approach. To address this issue, we present the first technique that auto- 
matically tailors a generic abstract interpreter to a custom usage scenario. With 
the term custom usage scenario, we refer to a particular piece of code and specific 
resource constraints. The key idea behind our technique is to phrase the prob- 
lem of customizing the abstract-interpretation configuration to a given usage 
scenario as an optimization problem. Specifically, different configurations are 
compared using a cost function that penalizes those that prove fewer properties 
or require more resources. The cost function can guide the configuration search of 
a wide range of existing optimization algorithms. This problem of tuning abstract 
interpreters can be seen as an instance of the more general problem of algorithm 
configuration [31]. In the past, algorithm configuration has been used to tune 
algorithms for solving various hard problems, such as SAT solving [32,33], and 
more recently, training of machine-learning models [3, 18,52]. 

We implement our technique in an open-source framework called TAILOR}, 
which configures a given abstract interpreter for a given usage scenario using a 
given optimization algorithm. As a result, TAILOR enables the abstract inter- 
preter to prove as many properties as possible within the resource limit without 
requiring any domain expertise on behalf of the user. 

Using TAILOR, we find that tailored configurations vastly outperform the 
default options pre-selected by the analysis designers. In fact, we show that 
this is possible even with very simple optimization algorithms. Our experiments 


1 The tool implementation is found at https://github.com/Practical-Formal- 
Methods/tailor and an installation at https: //doi.org/10.5281/zenodo.4719604. 
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also demonstrate that tailored configurations vary significantly depending on 
the usage scenario—in other words, there cannot be a single configuration that 
fits all scenarios. Finally, most of the generated configurations remain tailored 
to several subsequent code versions, suggesting that re-tuning is only necessary 
after major code changes. 


Contributions. We make the following contributions: 


1. We present the first technique for automatically tailoring abstract interpreters 
to custom usage scenarios. 

2. We implement our technique in an open-source framework called TAILOR. 

3. Using a state-of-the-art abstract interpreter, CRAB [25], with millions of con- 
figurations, we show the effectiveness of TAILOR on real-world benchmarks. 


2 Overview 


We now illustrate the workflow and tool architecture of TAILOR and provide 
examples of its effectiveness. 


Terminology. In the following, we refer to an abstract domain with all its 
options (e.g., enabling backward analysis or more precise treatment of arrays 
etc.) as an ingredient. 

As discussed earlier, abstract interpreters typically provide a large number of 
such ingredients. To make matters worse, it is also possible to combine different 
ingredients into a sequence (which we call a recipe) such that more properties are 
verified than with individual ingredients. For example, a user could configure the 
abstract interpreter to first use Intervals to verify as many properties as possible 
and then use Polyhedra to attempt verification of any remaining properties. Of 
course, the number of possible configurations grows exponentially in the length 
of the recipe (over 6M in our experiments for recipes up to length 3). 


Workflow. The high-level architecture of TAILOR is shown in Fig. 1. It takes 
as input the code to be analyzed (i.e., any program, file, function, or fragment), 
a user-provided resource limit, and optionally an optimization algorithm. We 
focus on time as the constrained resource in this paper, but our technique could 
be easily extended to other resources, such as memory. 

The optimization engine relies on a recipe generator to generate a fresh recipe. 
To assess its quality in terms of precision and performance, the recipe evaluator 
computes a cost for the recipe. The cost is computed by evaluating how precise 
and efficient the abstract interpreter is for the given recipe. This cost is used by 
the optimization engine to keep track of the best recipe so far, i.e., the one that 
proves the most properties in the least amount of time. TAILOR repeats this 
process for a given number of iterations to sample multiple recipes and returns 
the recipe with the lowest cost. 

Zooming in on the evaluator, a recipe is processed by invoking the abstract 
interpreter for each ingredient. After each analysis (i.e., one ingredient), the 
evaluator collects the new verification results, that is, the verified assertions. All 
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Fig. 1. Overview of our framework. 


verification results that have been achieved so far are subsequently shared with 
the analyzer when it is invoked for the next ingredient. Verification results are 
shared by converting all verified assertions into assumptions. After processing 
the entire recipe, the evaluator computes a cost for the recipe, which depends 
on the number of unverified assertions and the total analysis time. 

In general, there might be more than one recipe tailored to a particular 
usage scenario. Naively, finding one requires searching the space of all recipes. 
Section 4.3 discusses several optimization algorithms for performing this search, 
which TAILOR already incorporates in its optimization engine. 


Examples. As an example, let us consider the usage scenario where a user runs 
the CRAB abstract interpreter [25] in their editor for instant feedback during 
code development. This means that the allowed time limit for the analysis is very 
short, say, 1 s. Now assume that the code under analysis is a program file? of the 
multimedia processing tool FFMPEG, which is used to evaluate the effectiveness of 
TAILOR in our experiments. In this file, CRAB checks 45 assertions for common 
bugs, i.e., division by zero, integer overflow, buffer overflow, and use after free. 
Analysis of this file with the default CRAB configuration takes 0.35 s to 
complete. In this time, CRAB proves 17 assertions and emits 28 warnings about 
the properties that remain unverified. For this usage scenario, TAILOR is able to 
tune the abstract-interpreter configuration such that the analysis time is 0.57 s 
and the number of verified properties increases by 29% (i.e., 22 assertions are 
proved). Note that the tailored configuration uses a completely different abstract 
domain than the one in the default configuration. As a result, the verification 
results are significantly better, but the analysis takes slightly longer to complete 
(although remaining within the specified time limit). In contrast, enabling the 
most precise analysis in CRAB verifies 26 assertions but takes over 6min to 
complete, which by far exceeds the time limit imposed by the usage scenario. 
While it takes TAILOR 4.5 s to find the above configuration, this is time well 
invested; the configuration can be re-used for several subsequent code versions. 
In fact, in our experiments, we show that generated configurations can remain 


? https: //github.com/FFmpeg/FFmpeg/blob/master/libavformat /idcin.c 
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tailored for at least up to 50 subsequent commits to a file under version control. 
Given that changes in the editor are typically much more incremental, we expect 
that no re-tuning would be necessary at all during an editor session. Re-tuning 
may be beneficial after major changes to the code under analysis and can happen 
offline, e.g., between editor sessions, or in the worst case overnight. 

As another example, consider the usage scenario where CRAB is integrated 
in a CI pipeline. In this scenario, users should be able to spare more time for 
analysis, say, 5min. Here, let us assume that the analyzed code is a program 
file? of the CURL tool for transferring data by URL, which is also used in our 
evaluation. The default CRAB configuration takes 0.23 s to run and only verifies 
2 out of 33 checked assertions. TAILOR is able to find a configuration that takes 
7.6 s and proves 8 assertions. In contrast, the most precise configuration does 
not terminate even after 15 min. 

Both scenarios demonstrate that, even when users have more time to spare, 
the default configuration cannot take advantage of it to improve the verification 
results. At the same time, the most precise configuration is completely impracti- 
cal since it does not respect the resource constraints imposed by these scenarios. 


3 Background: A Generic Abstract Interpreter 


Many successful abstract interpreters (e.g., Astrée [5], C Global Surveyor [53], 
Clousot [17], CRAB [25], IKOS [6], Sparrow [46], and Infer [8]) follow the generic 
architecture in Fig. 2. In this section, we describe its main components to show 
that our approach should generalize to such analyzers. 


Memory Domain. Analysis of low-level languages such as C and LLVM-bitcode 
requires reasoning about pointers. It is, therefore, common to design a memory 
domain [42] that can simultaneously reason about pointer aliasing, memory con- 
tents, and numerical relations between them. 

Pointer domains resolve aliasing between pointers, and array domains reason 
about memory contents. More specifically, array domains can reason about indi- 
vidual memory locations (cells), infer universal properties over multiple cells, or 
both. Typically, reasoning about individual cells trades performance for precision 
unless there are very few array elements (e.g., [22,42]). In contrast, reasoning 
about multiple memory locations (summarized cells) trades precision for per- 
formance. In our evaluation, we use Array smashing domains [5] that abstract 
different array elements into a single summarized cell. Logico-numerical domains 
infer relationships between program and synthetic variables, introduced by the 
pointer and array domains, e.g., summarized cells. 

Next, we introduce domains typically used for proving the absence of 
runtime errors in low-level languages. Boolean domains (e.g., flat Boolean, 
BDDApron [1]) reason about Boolean variables and expressions. Non-relational 
domains (e.g., Intervals [11], Congruence [23]) do not track relations among dif- 
ferent variables, in contrast to relational domains (e.g., Equality [35], Zones [41], 


3 https: //github.com/curl/curl/blob/master /lib/cookie.c 
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Fig. 2. Generic architecture of an abstract interpreter. 


Octagons [43], Polyhedra [16]). Due to their increased precision, relational 
domains are typically less efficient than non-relational ones. Symbolic domains 
(e.g., Congruence closure [9], Symbolic constant [44], Term [21]) abstract com- 
plex expressions (e.g., non-linear) and external library calls by uninterpreted 
functions. Non-convex domains express disjunctive invariants. For instance, the 
DisInt domain [17] extends Intervals to a finite disjunction; it retains the scala- 
bility of the Intervals domain by keeping only non-overlapping intervals. On the 
other hand, the Boxes domain [24] captures arbitrary Boolean combinations of 
intervals, which can often be expensive. 


Fixpoint Computation. To ensure termination of the fixpoint computation, 
Cousot and Cousot introduce widening [12,14], which usually incurs a loss of 
precision. There are three common strategies to reduce this precision loss, which 
however sacrifice efficiency. First, delayed widening [5] performs a number of 
initial fixpoint-computation iterations in the hope of reaching a fixpoint before 
resorting to widening. Second, widening with thresholds [37,40] limits the number 
of program expressions (thresholds) that are used when widening. The third 
strategy consists in applying narrowing [12,14] a certain number of times. 


Forward and Backward Analysis. Classically, abstract interpreters analyze 
code by propagating abstract states in a forward manner. However, abstract 
interpreters can also perform backward analysis to compute the execution states 
that lead to an assertion violation. Cousot and Cousot [13,15] define a forward- 
backward refinement algorithm in which a forward analysis is followed by a back- 
ward analysis until no more refinement is possible. The backward analysis uses 
invariants computed by the forward analysis, while the forward analysis does not 
explore states that cannot reach an assertion violation based on the backward 
analysis. This refinement is more precise than forward analysis alone, but it may 
also become very expensive. 
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Algorithm 1: Optimization engine. 


1 Function OPTIMIZE(P, Tmax, lmaz, idom, iset, T€Cinit, GENERATERECIPE, 


ACCEPT) is 

2 // Phase 1 (optimize domains) 

3 TCChest I= TCCcurr := T€Cinit 

4 Costbest = COSteurr >= EVALUATE(P, Tmax, TCCbest ) 
5 for l := 1 to lmaz do 

6 for į := 1 to tgom- l do 

7 r€Cnext = GENERATERECIPE(TeCcurr, L) 
8 COStnext = EVALUATE(P, Tmar, TCCnest) 
9 if costnert < COStbest then 

10 | TeChest, COSt best = TCCnert, COSt next 
11 if ACCEPT (costcurr, COStneat) then 
12 | reCeurr, COSt curr I= TCCnert, COSt next 
13 // Phase 2 (optimize settings) 

14 for i := 1 to ise do 

15 TeCmut >= MUTATESETTINGS(reCpest ) 
16 COStmut := EVALUATE(P, Tmas, TCCmut) 
17 if costmut < coStpese then 
18 p T€Cbest, COStbest = TeCmut, COStmut 

19 return [eC pest 


Intra- and Inter-procedural Analysis. An intra-procedural analysis analyzes 
a function ignoring the information (i.e., call stack) that flows into it, while an 
inter-procedural analysis considers all flows among functions. The former is much 
more efficient and easy to parallelize, but the latter is usually more precise. 


4 Our Technique 


This section describes the components of TAILOR in detail; Sects. 4.1, 4.2, 4.3 
explain the optimization engine, recipe evaluator, and recipe generator (Fig. 1). 


4.1 Recipe Optimization 


Algorithm 1 implements the optimization engine. In addition to the code P 
and the resource limit rmaz, it also takes as input the maximum length of the 
generated recipes lmaz (i-e., the maximum number of ingredients), a function to 
generate new recipes GENERATERECIPE (i.e., the recipe generator from Fig. 1), 
and four other parameters, which we explain later. 

A tailored recipe is found in two phases. The first phase aims to find the 
best abstract domain for each ingredient, while the second tunes the remaining 
analysis settings for each ingredient (e.g., whether backward analysis should 
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be enabled). Parameters igom and iset control the number of iterations of each 
phase. Note that we start with a search for the best domains since they have the 
largest impact on the precision and performance of the analysis. 

During the first phase, the algorithm initializes the best recipe recyes; with 
an initial recipe rec init (line 3). The cost of this recipe is evaluated with function 
EVALUATE, which implements the recipe evaluator from Fig. 1. The subsequent 
nested loop (line 5) samples a number of recipes, starting with the shortest 
recipes (l := 1) and ending with the longest recipes (l := Imax). The inner loop 
generates igom ingredients for each ingredient in the recipe (i.e., idom - l total 
iterations) by invoking function GENERATERECIPE, and in case a recipe with 
lower cost is found, it updates the best recipe (lines 9-10). Several optimization 
algorithms, such as hill climbing and simulated annealing, search for an optimal 
result by mutating some of the intermediate results. Variable rec .,,,, stores inter- 
mediate recipes to be mutated, and function ACCEPT decides when to update it 
(lines 11-12). 

As explained earlier, the purpose of the first phase is to identify the best 
sequence of abstract domains. The second phase (lines 13-18) focuses on tuning 
the other settings of the best recipe so far. This is done by randomly mutating 
the best recipe via MUTATESETTINGS (line 15), and updating the best recipe if 
better settings are found (lines 17-18). After exploring iset random settings, the 
best recipe is returned to the user (line 19). 


4.2 Recipe Evaluation 


The recipe evaluator from Fig. 1 uses a cost function to determine the quality 
of a fresh recipe with respect to the precision and performance of the abstract 
interpreter. This design is motivated by the fact that analysis imprecision and 
inefficiency are among the top pain points for users [10]. 

Therefore, the cost function depends on the number of generated warnings 
w (that is, the number of unverified assertions), the total number of assertions 
in the code wiotq;, the resource consumption r of the analyzer, and the resource 
limit Tmar imposed on the analyzer: 


i 
w + 
Tmax è 
cost(W, Wtotal; T, Tmax) = Er if r < Tmar 
oo, otherwise 


Note that w and r are measured by invoking the abstract interpreter with the 
recipe under evaluation. The cost function evaluates to a lower cost for recipes 
that improve the precision of the abstract interpreter (due to the term w/wiyotat)- 
In case of ties, the term r/rmaxz causes the function to evaluate to a lower cost 
for recipes that result in a more efficient analysis. In other words, for two recipes 
resulting in equal precision, the one with the smaller resource consumption is 
assigned a lower cost. When a recipe causes the analyzer to exceed the resource 
limit, it is assigned infinite cost. 
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4.3 Recipe Generation 


In the literature, there is a broad range of optimization algorithms for different 
application domains. To demonstrate the generality and effectiveness of TAILOR, 
we instantiate it with four adaptations of three well-known optimization algo- 
rithms, namely random sampling [38], hill climbing (with regular restarts) [48], 
and simulated annealing [36,39]. Here, we describe these algorithms in detail, 
and in Sect.5, we evaluate their effectiveness. 

Before diving into the details, let us discuss the suitability of different kinds 
of optimization algorithms for our domain. There are algorithms that leverage 
mathematical properties of the function to be optimized, e.g., by computing 
derivatives as in Newton’s iterative method. Our cost function, however, is eval- 
uated by running an abstract interpreter, and thus, it is not differentiable or 
continuous. This constraint makes such analytical algorithms unsuitable. More- 
over, evaluating our cost function is expensive, especially for precise abstract 
domains such as Polyhedra. This makes algorithms that require a large number 
of samples, such as genetic algorithms, less practical. 

Now recall that Algorithm 1 is parametric in how new recipes are generated 
(with GENERATERECIPE) and accepted for further mutations (with ACCEPT). 
Instantiations of these functions essentially constitute our search strategy for 
a tailored recipe. In the following, we discuss four such instantiations. Note 
that, in theory, the order of recipe ingredients matters. This is because any 
properties verified by one ingredient are converted into assumptions for the next, 
and different assumptions may lead to different verification results. Therefore, 
all our instantiations are able to explore different ingredient orderings. 


Random Sampling. Random sampling (RS) just generates random recipes of a 
certain length. Function ACCEPT always returns false as each recipe is generated 
from scratch, and not as a result of any mutations. 


Domain-Aware Random Sampling. RS might generate recipes containing 
abstract domains of comparable precision. For instance, the Octagons domain is 
typically strictly more precise than Intervals. Thus, a recipe consisting of these 
domains is essentially equivalent to one containing only Octagons. 

Now, assume that we have a partially ordered set (poset) of domains that 
defines their ordering in terms of precision. An example of such a poset for a 
particular abstract interpreter is shown in Fig.3. An optimization algorithm can 
then leverage this information to reduce the search space of possible recipes. 
Given such a poset, we therefore define domain-aware random sampling (DARS), 
which randomly samples recipes that do not contain abstract domains of com- 
parable precision. Again, ACCEPT always returns false. 


Simulated Annealing. Simulated annealing (SA) searches for the best recipe by 
mutating the current recipe rec ey,, in Algorithm 1. The resulting recipe (recnext), 
if accepted on line 12, becomes the new recipe to be mutated. Algoirthm 2 
shows an instantiation of GENERATERECIPE, which mutates a given recipe such 
that the poset precision constraints are satisfied (i.e., there are no domains of 
comparable precision). A recipe is mutated either by adding new ingredients with 
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Algorithm 2: A recipe-generator instantiation. 


1 Function GENERATERECIPE(rec, Imax) is 

2 act := RANDOMACTION({ADD: 0.2, MOD: 0.8})) 

3 if act = ADD A LEN(rec) < lmar then 

4 ING? new = RANDOMPOSETLEASTINCOMPARABLE(rec) 

5 r€Cmut := ADDINGREDIENT(rec, ingr new) 

6 else 

7 ingr := RANDOMINGREDIENT(rec) 

8 actm := RANDOMACTION({GT: 0.5, LT: 0.3, INC: 0.2}) 
9 if act,, = GT then 

10 | ingr pew = POSETGREATERTHAN(ingr) 

11 else if act, = LT then 

12 | ingr new += POSETLESSTHAN(ingr) 

13 else 

14 T€Crem ‘= REMOVEINGREDIENT(rec, ingr) 

15 | ingr peu ‘= RANDOMPOSETLEASTINCOMPARABLE(T€Crem ) 
16 TeCmut := REPLACEINGREDIENT(rec, ingr, ingr new) 

17 if —POSETCOMPATIBLE(TeCmut) then 

18 TeCmut = GENERATERECIPE(rec, lmaz) 

19 return TeCmut 


20% probability or by modifying existing ones with 80% probability (line 2). The 
probability of adding ingredients is lower to keep recipes short. 

When adding a new ingredient (lines 4-5), Algorithm 2 calls RANDOM- 
POSETLEASTĪNCOMPARABLE, which considers all domains that are incompara- 
ble with the domains in the recipe. Given this set, it randomly selects from the 
domains with the least precision to avoid adding overly expensive domains. When 
modifying a random ingredient in the recipe (lines 7-16), the algorithm can 
replace its domain with one of three possibilities: a domain that is immediately 
more precise (i.e., not transitively) in the poset (via POSETGREATERTHAN), a 
domain that is immediately less precise (via POSETLESSTHAN), or an incompa- 
rable domain with the least precision (via RANDOMPOSETLEASTINCOMPARA- 
BLE). If the resulting recipe does not satisfy the poset precision constraints, our 
algorithm retries to mutate the original recipe (lines 17-18). 

For simulated annealing, ACCEPT returns true if the new cost (for the 
mutated recipe) is less than the current cost. It also accepts recipes whose cost 
is higher with a certain probability, which is inversely proportional to the cost 
increase and the number of explored recipes. That is, recipes with a small cost 
increase are likely to be accepted, especially at the beginning of the exploration. 


Hill Climbing. Our instantiation of hill climbing (HC) performs regular restarts. 
In particular, it starts with a randomly generated recipe that satisfies the poset 
precision constraints, generates 10 new valid recipes, and restarts with a random 
recipe. ACCEPT returns true only if the new cost is lower than the best cost, 
which is equivalent to the current cost. 
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5 Experimental Evaluation 


To evaluate our technique, we aim to answer the following research questions: 


RQ1: Is our technique effective in tailoring recipes to different usage scenarios? 
RQz2: Are the tailored recipes optimal? 

RQ3: How diverse are the tailored recipes? 

RQ4: How resilient are the tailored recipes to code changes? 


5.1 Implementation 


We implemented TAILOR by extending CRAB [25], a parametric framework for 
modular construction of abstract interpreters*. We extended CRAB with the 
ability to pass verification results between recipe ingredients as well as with the 
four optimization algorithms discussed in Sect. 4.3. 

Table 1 shows all settings and values used in our evaluation. The first three 
settings refer to the strategies discussed in Sect. 3 for mitigating the precision loss 
incurred by widening. For the initial recipe, TAILOR uses Intervals and the CRAB 
default values for all other settings (in bold in the table). To make the search more 
efficient, we selected a representative subset of all possible setting values. 

CRAB uses a DSA-based [26] pointer analysis and can, optionally, reason 
about array contents using array smashing. It offers a wide range of logico- 
numerical domains, shown in Fig.3. The bool domain is the flat Boolean 
domain, ric is a reduced product of Intervals and Congruence, and term(int) 
and term(disInt) are instantiations of the Term domain with intervals and 
disInt, respectively. Although CRAB provides a bottom-up inter-procedural 
analysis, we use the default intra-procedural analysis; in fact, most analyses 
deployed in real usage scenarios are intra-procedural due to time constraints [10]. 


5.2 Benchmark Selection 


For our evaluation, we systematically selected popular and (at some point) active 
C projects on GitHub. In particular, we chose the six most starred C repositories 


Table 1. CRAB settings and their possible values as used in our experiments. Default 
settings are shown in bold. 


Setting Possible values 
NUM_DELAY_WIDEN {1, 2, 4, 8, 16} 
NUM_NARROW_ITERATIONS | {1, 2, 3, 4} 
NUM_WIDEN_THRESHOLDS | {0, 10, 20, 30, 40} 
BACKWARD ANALYSIS. | {OFF, ON} 

ARRAY SMASHING {OFF, ON} 
ABSTRACT DOMAINS All domains in Fig. 3 


4 CRAB is available at https://github.com/seahorn/crab. 
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Table 2. Overview of projects. 


Project | Description 


CURL Tool for transferring data by URL 


DARKNET | Convolutional neural-network framework 


FFMPEG | Multimedia processing tool 


GIT Distributed version-control tool 


PHP-SRC_ | PHP interpreter 


REDIS Persistent in-memory database 


with over 300 commits that we could successfully build with the Clang-5.0 com- 
piler. We give a short description of each project in Table 2. 

For analyzing these projects, we needed to introduce properties to be verified. 
We, thus, automatically instrumented these projects with four types of assertions 
that check for common bugs; namely, division by zero, integer overflow, buffer 
overflow, and use after free. Introducing assertions to check for runtime errors 
such as these is common practice in program analysis and verification. 

As projects consist of different numbers of files, to avoid skewing the results 
in favor of a particular project, we randomly and uniformly sampled 20 LLVM- 
bitcode files from each project, for a total of 120. To ensure that each file was 
neither too trivial nor too difficult for the abstract interpreter, we used the num- 
ber of assertions as a complexity indicator and only sampled files with at least 20 
assertions and at most 100. Additionally, to guarantee all four assertion types 
were included and avoid skewing the results in favor of a particular assertion 
type, we required that the sum of assertions for each type was at least 70 across 
all files—this exact number was largely determined by the benchmarks. 

Overall, our benchmark suite of 120 files totals 1346 functions, 5557 assertions 
(on average 4 assertions per function), and 667927 LLVM instructions (Table 3). 


5.3 Results 


We now present our experimental results for each research question. We performed 
all experiments on a32-core Intel @) Xeon @) E5-2667 v2 CPU @ 3.30 GHz machine 
with 264 GB of memory, running Ubuntu 16.04.1 LTS. 


polyhedra boxes 
| 
octagons term(disInt) 
| we 
zones term(int) ric disInt 
intervals bool 


Fig. 3. Comparing logico-numerical domains in CRAB. A domain d; is less precise than 
dz if there is a path from dı to d2 going upward, otherwise dı and dz are incomparable. 
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Table 3. Benchmark characteristics (20 files per project). The last three columns show 
the number of functions, assertions, and LLVM instructions in the analyzed files. 


Project | Functions | Assertions LLVM instructions 
CURL 306 787 50 541 
DARKNET 130 958 55 847 
FFMPEG 103 888 27 653 
GIT 218 768 102 304 
PHP-SRC 268 1031 305 943 
REDIS 321 1125 125 639 
Total 1346 5557 667 927 


RQ1: Is Our Technique Effective in Tailoring Recipes to Different 
Usage Scenarios? We instantiated TAILOR with the four optimization algo- 
rithms described in Sect. 4.3: RS, DARS, SA, and HC. We constrained the analysis 
time to simulate two usage scenarios: 1 s for instant feedback in the editor, and 
5min for feedback in a CI pipeline. We compare TAILOR with the default recipe 
(DEF), i.e., the default settings in CRAB as defined by its designer after careful 
tuning on a large set of benchmarks over the years. DEF uses a combination 
of two domains, namely, the reduced product of Boolean and Zones. The other 
default settings are in Table 1. 

For this experiment, we ran TAILOR with each optimization algorithm on 
the 120 benchmark files, enabling optimization at the granularity of files. Each 
algorithm was seeded with the same random seed. In Algorithm 1, we restrict 
recipes to contain at most 3 domains (lmas = 3) and set the number of iterations 
for each phase to be 5 and 10 (idom = 5 and iset = 10). 

The results are presented in Fig.4, which shows the number of assertions 
that are verified with the best recipe found by each algorithm as well as by 
the default recipe. All algorithms outperform the default recipe for both usage 
scenarios, verifying almost twice as many assertions on average. The random- 
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Fig. 4. Comparison of the number of assertions verified with the best recipe generated 
by each optimization algorithm and with the default recipe, for varying timeouts. 
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Fig. 5. Comparison of the number of assertions verified by a tailored vs. the default 
recipe. 
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Fig. 6. Comparison of the total time (in sec) that each algorithm requires for all iter- 
ations, for varying timeouts. 


sampling algorithms are shown to find better recipes than the others, with DARS 
being the most effective. Hill climbing is less effective since it gets stuck in local 
cost minima despite restarts. Simulated annealing is the least effective because 
it slowly climbs up the poset toward more precise domains (see Algorithm 2). 
However, as we explain later, we expect the algorithms to converge on the number 
of verified assertions for more iterations. 

Figure5 gives a more detailed comparison with the default recipe for the 
time limit of 5 min. In particular, each horizontal bar shows the total number of 
assertions verified by each algorithm. The orange portion represents the asser- 
tions verified by both the default recipe and the optimization algorithm, while 
the green and red portions represent the assertions only verified by the algo- 
rithm and default recipe, respectively. These results show that, in addition to 
verifying hundreds of new assertions, TAILOR is able to verify the vast majority 
of assertions proved by the default recipe, regardless of optimization algorithm. 

In Fig. 6, we show the total time each algorithm takes for all iterations. DARS 
takes the longest. This is due to generating more precise recipes thanks to its 
domain knowledge. Such recipes typically take longer to run but verify more 
assertions (as in Fig. 4). On average, for all algorithms, TAILOR requires only 
30 s to complete all iterations for the 1-s timeout and 16 min for the 5-min 
timeout. As discussed in Sect. 2, this tuning time can be spent offline. 
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Fig. 7. Comparison of the number of assertions verified with the best recipe generated 
by the different optimization algorithms, for different numbers of iterations. 


Figure 7 compares the total number of assertions verified by each algorithm 
when TAILOR runs for 40 (idom = 5 and iset = 10) and 80 (idom = 10 and iset = 
20) iterations. The results show that only a relatively small number of additional 
assertions are verified with 80 iterations. In fact, we expect the algorithms to 
eventually converge on the number of verified assertions, given the time limit 
and precision of the available domains. 

As DARS performs best in this comparison, we only evaluate DARS in the 
remaining research questions. We use a 5-min timeout. 


RQ1 takeaway: TAILOR verifies between 1.6-2.1x the assertions of 
the default recipe, regardless of optimization algorithm, timeout, or 
number of iterations. In fact, even very simple algorithms (such as RS) 
significantly outperform the default recipe. 


RQ2: Are the Tailored Recipes Optimal? To check the optimality of the 
tailored recipes, we compared them with the most precise (and least efficient) 
CRAB configuration. It uses the most precise domains from Fig.3 (i.e., bool, 
polyhedra, term(int), ric, boxes, and term(disInt)) in a recipe of 6 ingre- 
dients and assigns the most precise values to all other settings from Table 1. We 
generously gave a 30-min timeout to this recipe. 

For 21 out of 120 files, the most precise recipe ran out of memory (264 GB). 
For 86 files, it terminated within 5min, and for 13, it took longer (within 
30min)—in many cases, this was even longer than TAILOR’s tuning time in 
Fig. 6. We compared the number of assertions verified by our tailored recipes 
(which do not exceed 5 min) and by the most precise recipe. For the 86 files that 
terminated within 5 min, our recipes prove 618 assertions, whereas the most pre- 
cise recipe proves 534. For the other 13 files, our recipes prove 119 assertions, 
whereas the most precise recipe proves 98. 

Consequently, our (in theory) less precise and more efficient recipes prove 
more assertions in files where the most precise recipe terminates. Possible expla- 
nations for this non-intuitive result are: (1) Polyhedra coefficients may overflow, 
in which case the constraints are typically ignored by abstract interpreters, and 
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Fig. 8. Effect of different settings on the precision and performance of the abstract 
interpreter. (DW: NUM-DELAY WIDEN, NI: NUM_NARROW_ITERATIONS, WT: NUM WIDEN- 
THRESHOLDS, AS: array smashing, B: backward analysis, D: abstract domain, O: ingredi- 
ent ordering). 


Percentage of mutated recipes 


(2) more precise domains with different widening operations may result in less 
precise results [2,45]. 

We also evaluated the optimality of tailored recipes by mutating individual 
parts of the recipe and comparing to the original. In particular, for each setting 
in Table 1, we tried all possible values and replaced each domain with all other 
comparable domains in the poset of Fig. 3. For example, for a recipe including 
zones, we tried octagons, polyhedra, and intervals. In addition, we tried 
all possible orderings of the recipe ingredients, which in theory could produce 
different results. We observed whether these changes resulted in a difference in 
the precision and performance of the analyzer. 

Figure 8 shows the results of this experiment, broken down by setting. Equal 
(in orange) indicates that the mutated recipe proves the same number of asser- 
tions within +5 s of the original. Positive (in green) indicates that it either proves 
more assertions or the same number of assertions at least 5s faster. Negative (in 
red) indicates that the mutated recipe either proves fewer assertions or the same 
number of assertions at least 5 seconds slower. 

The results show that, for our benchmarks, mutating the recipe found by 
TAILOR rarely led to an improvement. In particular, at least 93% of all mutated 
recipes were either equal to or worse than the original recipe. In the majority 
of these cases, mutated recipes are equally good. This indicates that there are 
many optimal or close-to-optimal solutions and that TAILOR is able to find one. 


RQ2 takeaway: As compared to the most precise recipe, TAILOR 
verified more assertions across benchmarks where the most precise 
recipe terminated. Furthermore, mutating recipes found by TAILOR 
resulted in improvement only for less than 7% of recipes. 


RQ3: How Diverse are the Tailored Recipes? To motivate the need for 
optimization, we must show that tailored recipes are sufficiently diverse such that 
they could not be replaced by a well-crafted default recipe. To better understand 
the characteristics of tailored recipes, we manually inspected all of them. 
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Fig. 9. Occurrence of domains (in %) in the best recipes for all assertion types. 


TAILOR generated recipes of length greater than 1 for 61 files. Out of these, 
37 are of length 2 and 24 of length 3. For 77% of generated recipes, NUM-DELAY_- 
WIDEN is not set to the default value of 1. Additionally, 55% of the ingredients 
enable array smashing, and 32% enable backward analysis. 

Figure9 shows how often (in percentage) each abstract domain occurs in a 
best recipe found by TAILOR. We observe that all domains occur almost equally 
often, with 6 of the 10 domains occurring in between 9% and 13% of recipes. The 
most common domain was bool at 18%, and the least common was intervals 
at 4%. We observed a similar distribution of domains even when instrumenting 
the benchmarks with only one assertion type, e.g., checking for integer overflow. 

We also inspected which domain combinations are frequently used in the tai- 
lored recipes. One common pattern is combinations between bool and numerical 
domains (18 occurrences). Similarly, we observed 2 occurrences of term(disInt) 
together with zones. Interestingly, the less powerful variants of combining 
disInt with zones (3 occurrences) and term(int) with zones (6 occurrences) 
seem to be sufficient in many cases. Finally, we observed 8 occurrences of 
polyhedra or octagons with boxes, which are the most precise convex and 
non-convex domains. Our approach is, thus, not only useful for users, but also 
for designers of abstract interpreters by potentially inspiring new domain com- 
binations. 


RQ3 takeaway: The diversity of tailored recipes prevents replacing 
them with a single default recipe. Over half of the tailored recipes 
contain more than one ingredient, and ingredients use a variety of 
domains and their settings. 


RQ4: How Resilient are the Tailored Recipes to Code Changes? We 
expect tailored recipes to be resilient to code changes, i.e., to retain their opti- 
mality across several changes without requiring re-tuning. We now evaluate if a 
recipe tailored for one code version is also tailored for another, even when the 
two versions are 50 commits apart. 

For this experiment, we took a random sample of 60 files from our benchmarks 
and retrieved the 50 most recent commits per file. We only sampled 60 out of 
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Fig. 10. Difference in the safe assertions across commits. 


120 files as building these files for each commit is quite time consuming—it can 
take up to a couple of days. We instrumented each file version with the four 
assertion types described in Sect. 5.2. It should be noted that, for some files, we 
retrieved fewer than 50 versions either because there were fewer than 50 total 
commits or our build procedure for the project failed on older commits. This is 
also why we did not run this experiment for over 50 commits. 

We analyzed each file version with the best recipe, Ro, found by TAILOR for 
the oldest file version. We compared this recipe with new best recipes, Rn, that 
were generated by TAILOR when run on each subsequent file version. For this 
experiment, we used a 5-min timeout and 40 iterations. 

Note that, when running TAILOR with the same optimization algorithm and 
random seed, it explores the same recipes. It is, therefore, very likely that recipe 
Ro for the oldest commit is also the best for other file versions since we only 
explore 40 different recipes. To avoid any such bias, we performed this experiment 
by seeding TAILOR with a different random seed for each commit. The results 
are shown in Fig. 10. 

In Fig. 10, we give a bar chart comparing the number of files per commit 
that have a positive, equal, and negative difference in the number of verified 
assertions, where commit 0 is the oldest commit and 49 the newest. An equal 
difference (in orange) means that recipe Ro for the oldest commit proves the 
same number of assertions in the current file version, fn, as recipe Rn found by 
running TAILOR on fn. To be more precise, we consider the two recipes to be 
equal if they differ by at most 1 verified assertion or 1% of verified assertions since 
such a small change in the number of safe assertions seems acceptable in practice 
(especially given that the total number of assertions may change across commits). 
A positive difference (in green) means that R, achieves better verification results 
than R,,, that is, Ro proves more assertions safe (over 1 assertion or 1% of the 
assertions that R, proves). Analogously, a negative difference (in red) means 
that Ro proves fewer assertions. We do not consider time here because none of 
the recipes timed out when applied on any file version. 

Note that the number of files decreases for newer commits. This is because 
not all files go forward by 50 commits, and even if they do, not all file versions 
build. However, in a few instances, the number of files increases going forward 


Automatically Tailoring Abstract Interpretation to Custom Usage Scenarios 795 


in time. This happens for files that change names, and later, change back, which 
we do not account for. 

For the vast majority of files, using recipe R, (found for the oldest commit) 
is as effective as using Rn (found for the current commit). The difference in safe 
assertions is negative for less than a quarter of the files tested, with the average 
negative difference among these files being around 22% (i.e., R, proved 22% 
fewer assertions than R,, in these files). On the remaining three quarters of the 
files tested however, Ro proves at least as many assertions as Rn, and thus, Ro 
tends to be tailored across code versions. 

Commits can result in both small and large changes to the code. We therefore 
also measured the average difference in the number of verified assertions per 
changed line of code with respect to the oldest commit. For most files, regardless 
of the number of changed lines, we found that R, and R,, are equally effective, 
with changes to 1000 LOC or more resulting in little to no loss in precision. In 
particular, the median difference in safe assertions across all changes between 
Ro and Rn was 0 (i.e., Ro proved the same number of assertions safe as Rp), 
with a standard deviation of 15 assertions. We manually inspected a handful 
of outliers where R, proved significantly fewer assertions than R,, (difference 
of over 50 assertions). These were due to one file from GIT where R, is not as 
effective because the widening and narrowing settings have very low values. 


RQ4 takeaway: For over 75% of files, TAILOR’s recipe for a previous 
commit (from up to 50 commits previous) remains tailored for future 
versions of the file, indicating the resilience of tailored recipes across 
code changes. 


5.4 Threats to Validity 


We have identified the following threats to the validity of our experiments. 


Benchmark Selection. Our results may not generalize to other bench- 
marks. However, we selected popular GitHub projects from different application 
domains (see Table 2). Hence, we believe that our benchmark selection mitigates 
this threat and increases generalizability of our findings. 


Abstract Interpreter and Recipe Settings. For our experiments, we only 
used a single abstract interpreter, CRAB, which however is a mature and actively 
supported tool. The selection of recipe settings was, of course, influenced by the 
available settings in CRAB. Nevertheless, CRAB implements the generic archi- 
tecture of Fig.2, used by most abstract interpreters, such as those mentioned 
at the beginning of Sect.3. We, therefore, expect our approach to generalize to 
such analyzers. 


Optimization Algorithms. We considered four optimization algorithms, but 
in Sect. 4.3, we explain why these are suitable for our application domain. More- 
over, TAILOR is configurable with respect to the optimization algorithm. 
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Assertion Types. Our results are based on four types of assertions. However, 
these cover a wide range of runtime errors that are commonly checked by static 
analyzers. 


6 Related Work 


The impact of different abstract-interpretation configurations has been previ- 
ously evaluated [54] for Java programs and partially inspired this work. To the 
best of our knowledge, we are the first to propose tailoring abstract interpreters 
to custom usage scenarios using optimization. 

However, optimization is a widely used technique in many engineering dis- 
ciplines. In fact, it is also used to solve the general problem of algorithm confi- 
guration [31], of which there exist numerous instantiations, for instance, to 
tune hyper-parameters of learning algorithms [3, 18,52] and options of constraint 
solvers [32,33]. Existing frameworks for algorithm configuration differ from ours 
in that they are not geared toward problems that are solved by sequences of 
algorithms, such as analyses with different abstract domains. Even if they were, 
our experience with TAILOR shows that there seem to be many optimal or close- 
to-optimal configurations, and even very simple optimization algorithms such as 
RS are surprisingly effective (see RQ2); similar observations were made about 
the effectiveness of random search in hyper-parameter tuning [4]. 

In the rest of this section, we focus on the use of optimization in program 
analysis. It has been successfully applied to a number of program-analysis prob- 
lems, such as automated testing [19,20], invariant inference [50], and compiler 
optimizations [49]. 

Recently, researchers have started to explore the direction of enriching pro- 
gram analyses with machine-learning techniques, for example, to automatically 
learn analysis heuristics [27,34,47,51]. A particularly relevant body of work is 
on adaptive program analysis [28-30], where existing code is analyzed to learn 
heuristics that trade soundness for precision or that coarsen the analysis abstrac- 
tions to improve memory consumption. More specifically, adaptive program anal- 
ysis poses different static-analysis problems as machine-learning problems and 
relies on Bayesian optimization to solve them, e.g., the problem of selectively 
applying unsoundness to different program components (e.g., different loops in 
the program) [30]. The main insight is that program components (e.g., loops) 
that produce false positives are alike, predictable, and share common proper- 
ties. After learning to identify such components for existing code, this technique 
suggests components in unseen code that should be analyzed unsoundly. 

In contrast, TAILOR currently does not adjust soundness of the analysis. 
However, this would also be possible if the analyzer provided the corresponding 
configurations. More importantly, adaptive analysis focuses on learning analysis 
heuristics based on existing code in order to generalize to arbitrary, unseen code. 
TAILOR, on the other hand, aims to tune the analyzer configuration to a custom 
usage scenario, including a particular program under analysis. In addition, the 
custom usage scenario imposes user-specific resource constraints, for instance by 
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limiting the time according to a phase of the software-engineering life cycle. As 
we show in our experiments, the tuned configuration remains tailored to several 
versions of the analyzed program. In fact, it outperforms configurations that are 
meant to generalize to arbitrary programs, such as the default recipe. 


7 Conclusion 


In this paper, we have proposed a technique and framework that tailors a generic 
abstract interpreter to custom usage scenarios. We instantiated our framework 
with a mature abstract interpreter to perform an extensive evaluation on real- 
world benchmarks. Our experiments show that the configurations generated by 
TAILOR are vastly better than the default options, vary significantly depend- 
ing on the code under analysis, and typically remain tailored to several subse- 
quent code versions. In the future, we plan to explore the challenges that an 
inter-procedural analysis would pose, for instance, by using a different recipe for 
computing a summary of each function or each calling context. 
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Abstract. We develop machine-checked verifications of the full func- 
tional correctness of C implementations of the eponymous graph algo- 
rithms of Dijkstra, Kruskal, and Prim. We extend Wang et al.’s Cer- 
tiGraph platform to reason about labels on edges, undirected graphs, 
and common spatial representations of edge-labeled graphs such as adja- 
cency matrices and edge lists. We certify binary heaps, including Floyd’s 
bottom-up heap construction, heapsort, and increase/decrease priority. 
Our verifications uncover subtle overflows implicit in standard text- 
book code, including a nontrivial bound on edge weights necessary to 
execute Dijkstra’s algorithm; we show that the intuitive guess fails and 
provide a workable refinement. We observe that the common notion that 
Prim’s algorithm requires a connected graph is wrong: we verify that 
a standard textbook implementation of Prim’s algorithm can compute 
minimum spanning forests without finding components first. Our verifi- 
cation of Kruskal’s algorithm reasons about two graphs simultaneously: 
the undirected graph undergoing MSF construction, and the directed 
graph representing the forest inside union-find. Our binary heap verifi- 
cation exposes precise bounds for the heap to operate correctly, avoids a 
subtle overflow error, and shows how to recycle keys to avoid overflow. 


Keywords: Separation logic - Graph algorithms - Coq - VST 


1 Introduction 


Dijkstra’s eponymous shortest-path algorithm [22] finds the cost-minimal paths 
from a distinguished source vertex to all reachable vertices in a directed graph. 
Prim’s [61] and Kruskal’s [42] algorithms return minimal spanning trees for undi- 
rected graphs. Binary heaps are the first priority queue one typically encoun- 
ters. These algorithms/structures are classic and ubiquitous, appearing widely 
in textbooks [20,33,36,65,66,68] and in real routing protocol libraries. 

In addition to decades of use and textbook analysis, recent efforts have ver- 
ified one or more of these algorithms in proof assistants and formally proved 
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claims about their behavior [12,15,30,45,53]. A reasonable person might think 
that all that can be said, has been. However, we have found that textbook code 
glosses over a cornucopia of issues that routinely crop up in real-world settings: 
under /overflows, integration with performant data structures, manual memory 
(de-)allocation, error handling, casts, memory alignment, etc. Further, previous 
verification efforts with formal checkers often operate within idealized formal 
environments, which likewise leads them to ignore the same kinds of issues. 

In our work, we provide C implementations of each of these algorithms/data 
structures, and prove in Coq [71] the functional correctness of the same with 
respect to the formal semantics of CompCert C [50]. By “functional correctness” 
we mean natural algorithmic specifications; we do not prove resource bounds. 
Although our C code is developed from standard textbooks, we uncover several 
subtleties that are absent from the algorithmic and formal methods literature: 


§3.2 an overflow in Dijkstra’s algorithm, avoiding which requires a nontrivial 
refinement to the algorithm’s precondition to bound edge weights; 

§4.2 that the specification of Prim’s algorithm can be improved to apply to 
disconnected graphs without any change to textbook (pseudo-)code; 

§4.2 the presence of a wholly unneeded line of (pseduo-)code in Prim’s algo- 
rithm, and an associated unneeded function argument; 

§5 several potential overflows in binary heaps equipped with Floyd’s linear- 

time build-heap function and an edit-priority operation. 


We wish to develop general and reusable techniques for verifying graph- 
manipulating programs written in real programming languages. This is a sig- 
nificant challenge, and so we choose to leverage and/or extend three large exist- 
ing proof developments to state and prove the full functional correctness of our 
code in Coq: CompCert; the Verified Software Toolchain [4] (VST) separation 
logic [59] deductive verifier; and our own previous efforts [73], hereafter dubbed 
the CertiGraph project. Our primary extensions are to the third, and include: 


§2.1 pure/abstract reasoning for graphs with edge labels, (e.g., a distinguished 
edge-label value for “infinity” that indicates invalid/absent edges); 

§2.2 spatial representations and associated reasoning for edge-labeled graphs 
(several flavors of adjacency matrices as well as edge lists); 

§2.3 pure reasoning for undirected graphs (e.g., notions of connectedness). 


We prove that our pure machinery and our spatial machinery are well-isolated 
from each other by verifying several implementations (of each of Dijkstra and 
Prim) that represent graphs differently in memory but reuse the entire pure 
portion of the proof. Likewise, we show that our spatial reasoning is generic 
by reusing graph representations across Dijkstra and Prim. Our verification of 
Kruskal proves that we can reason about two graphs simultaneously: a directed 
graph with vertex labels for union-find and an undirected graph with edge labels 
for which we are building a spanning forest. In addition to our verification of 
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Dijkstra, Prim, and Kruskal, we develop increased lemma support for the preex- 

isting CertiGraph union-find example [73]. Our extension to “base VST” (e.g., 

verifications without graphs) primarily consists of our verified binary heap. 
The remainder of this paper is organized as follows: 


§2 We explain our extensions to CertiGraph: edge-labeled graphs, spatial rep- 
resentations of such graphs, and undirected graphs. 

§3 We explain our verification of Dijkstra’s algorithm in some detail, discuss 
a potential overflow, and refine the precondition to avoid it. 

§4 We overview our verifications of the Minimum Spanning Tree/Forest algo- 
rithms of Prim and Kruskal, focusing on high-level points such as our 
improved novel specification of Prim’s. 

85 We overview our verification of binary heaps, including a discussion of 
Floyd’s bottom-up heap construction and the edit_priority operation. 

86 We briefly discuss engineering, e.g. statistics for our formal development. 

87 We discuss related work, outline future research directions, and conclude. 


Our results are completely machine-checked in Coq and publicly available [1]. 


2 Extensions to CertiGraph 


We begin with the briefest of introductions to CertiGraph’s core structure and 
then detail the extensions we make to various levels of CertiGraph in service of 
our Dijkstra, Prim, and Kruskal verifications. Ignoring modularity and eliding 
elements not used in this work, a mathematical graph in CertiGraph is a tuple: 
(V, E, vvalid, evalid, src, dst, vlabel, elabel, sound). Here V/E are the car- 
rier types of vertices/edges, vvalid/evalid place restrictions specifying whether 
a vertex/edge is valid’, and src/dst : € — V map edges to their source/des- 
tination. Labels are allowed on vertices and edges, and a soundness condition 
allows custom application-specific restrictions [73]. Mathematical graphs connect 
to graphs in computer memory via spatial predicates in separation logic. 


2.1 Pure Reasoning for Adjacency Matrix-Represented Graphs 


Two of our algorithms operate over graphs represented as adjacency matrices. 
Not every legal graph can be represented as an adjacency matrix, so we develop a 
unified, reusable, and extendable soundness condition SoundAdjMat that a graph 
must satisfy in order for it to be represented as an adjacency matrix. 
SoundAdjMat is parameterized by the graph’s size and a distinguished 
number inf. We restrict most fields in the tuple: (V = Z, E = Z xZ, 
vvalid = Av. 0 < v < size, evalid = ..., src = fst, dst = snd, vlabel, 
elabel, sound = ...). We also restrict the carrier type of vertex labels to unit 


1 Validity denotes presence in the graph: e.g., if we are using Z as the carrier type V, 
and have only 7 vertices, then vvalid(x) is probably the proposition 0 < x < 7). 
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and edge labels to Z. We require the parameters size and inf be strictly posi- 
tive and representable on the machine. Most critical, however, is the semantics 
of evalid: a valid edge must have a machine-representable label and that label 
cannot have value inf; an invalid edge must have label inf. Last, the graph 
must be finite. 

The restriction on edge labels is necessary because we are working with 
labeled adjacency matrices on a real system: we need to set aside a distinguished 
number inf such that edgeweight inf indicates the absence of an edge. We can- 
not prescribe some inf because client needs can vary widely. For instance, our 
verifications of Dijkstra’s and Prim’s algorithms require subtly different infs. 

SoundAdjMat guarantees spatial representability as an adjacency matrix, 
but it can be extended with further algorithm-specific restrictions before being 
plugged in for sound. Dijkstra’s algorithm requires nonnegative edge weights, 
and—as we will discuss in §3.2—nontrivial restrictions on size and inf. 


2.2 New Spatial Representations for Edge-Labeled Graphs 


We give predicates for adjacency matrices and edge lists for edge-labeled graphs. 


Adjacency Matrices. Adjacency matrices enable efficient label access for edge- 
labeled graphs. We support three common adjacency matrix representations: a 
stack-allocated 2D array int graph[size] [size], a stack-allocated 1D array 
int graph[sizexsize], and a heap-allocated 2D array int **graph. To the 
casual observer, these are essentially interchangeable, but that is a mistake when 
thinking spatially. Apart from the arithmetic that the second flavor uses to access 
cells, there is a more subtle point: the first and second enjoy a contiguous block 
of memory, but the third does not: it is an allocated “spine” with pointers to 
separately-allocated rows. For a taste, the spatial representation of the first is: 


ptr + (i x size) 


array(ptr, list) 4 *K (ptr + i) © list[i]) 
i€(0,|list|) 


arr_addr (ptr, i, size) 


let row := graph2mat(y)[i] in 
array(arr_addr(ptr, i, |row|), row) 


> arr_rep(7, v, g-addr) 
vey 


arr_rep(¥, i, ptr) 


graph_rep(y, g-addr, -) 


We use the separation logic * in its iterated form to say that the arrays are 
separate in memory. We elide details relating to object sizes, pointer alignment, 
and so forth, although our formal proofs handle such matters. Of particular 
note are graph2mat, which performs two projections to drag out the graph’s 
nested edge labels into a 2D matrix, and arr_addr, which in this instance simply 
computes the address of any legal row ¿i from the base address of the graph. 
Notice that this graph_rep predicate ignores its third argument. To represent a 
heap-allocated 2D array we can still use graph2mat but can no longer use address 
arithmetic; the third parameter is then a list of pointers to the row sub-arrays. 
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While ironing out these spatial wrinkles, we develop utilities that easily 
unfold and refold our adjacency matrices, thus smoothing user experience when 
reading and writing arrays and cells. Of course these utilities themselves vary by 
flavor of representation, but the net effect is that users of our adjacency matrices 
really can be agnostic to the style of representation they are using (see §3.1). 


Edge Lists. Edge lists are the representation of choice for sparse graphs. Our C 
implementation defines an edge as a struct containing src, dst, and weight, 
and defines a graph as a struct containing the graph’s size, edge count, and an 
array of edges. Our spatial representation follows this pattern: 


graph_rep(7, g_addr,e_addr) = 


(g-addr + (|7.V|, |y-E|, e-addr)) x array(e_addr, y.E) 


2.3 Undirectedness in a Directed World 


The CertiGraph library presented in [73] supports only directed graphs, and, as 
we have seen, bakes direction-reliant idioms such as src and dst deep into its 
development. Our challenge is to add support for undirected graphs atop of this. 

Our approach is to observe that every directed graph can be treated as an 
undirected graph by ignoring edge direction. We develop a lightweight layer of 
“undirected flavored” definitions atop of the existing “directed flavored” defini- 
tions, state and prove connections between these, and then build the undirected 
infrastructure we need. The result is that we retain full access to CertiGraph’s 
graph theory formalizations modulo some mathematical bridging. 

Our basic “undirected flavored” definitions are standard [20]. Vertices u and v 
are adjacent if there is an edge between them in either direction; vertices are 
self-adjacent. A valid upath (undirected path) is list of valid vertices that form a 
pairwise-adjacent chain. Two vertices are connected when a valid upath features 
them as head and foot (essentially the transitive closure of adjacenct). 

The definitions above sync up with preexisting “directed flavored” definitions. 
Intuitively, undirectedness is more lax than directedness, and so it is unsurprising 
that these connections are straightforward weakenings of directed properties. We 
next give standard definitions [20] that culminate in minimum_spanning_forest, 
which is exactly our postcondition of both Prim’s and Kruskal’s algorithms.” 

An undirected cycle (ucycle) is a valid non-empty upath whose first and last 
vertices are equal. A connected_graph means that any two valid vertices are 
connected. is_partial_graph f g means everything in f is in g. We proceed: 


1 Definition uforest g := 

2 (V e, evalid g e — strong_evalid g e) A 

3 (V pl, ~ucycle g p 1). 

4 Definition spanning g g’ := 

5 Vu v, connected g u v + connected g’ u v. 


? That Prim’s postcondition has a forest may raise an eyebrow. See §4.2. 
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6 Definition spanning_uforest f g := 
7 is_partial_graph f g A uforest f A spanning f g. 


The strong_evalid predicate means that the src and dst of the edge are also 
valid, so e.g., a valid edge cannot point to a deleted/absent vertex. The second 
conjunct of uforest is critical: a forest has no undirected cycles. The other 
definitions are straightforward from there, and minimum_spanning_forest f g 
means that no other spanning forest has lower total edge cost than f. 

Our undirected work is also compatible with our new developments in §2.1 
and §2.2. An adjacency matrix-representable undirected graph has all the pure 
properties discussed in SoundAdjMat, and also has symmetry across the left 
diagonal. We extend SoundAdjMat into SoundUAdjMat by requiring that all valid 
edges have src < dst. This effectively “turns off” the matrix on one half of the 
diagonal and avoids double-counting. Prim’s algorithm uses SoundUAdjMat and 
places no further restrictions. Further, spatial representations and fold/unfold 
utilities are shared across directed and undirected adjacency matrices. 


3 Shortest Path 


We verify a standard C implementation of Dijkstra’s algorithm. We first sketch 
our proof in some detail with an emphasis on our loop invariants, then uncover 
and remedy a subtle overflow bug, and finish with a discussion of related work. 


3.1 Verified Dijkstra’s Algorithm in C 


Figure 1 shows the code and proof sketch of Dijkstra’s algorithm. Red text is 
used in the figure to highlight changes compared to the annotation immediately 
prior. Our code is implemented exactly as suggested by CLRS [20], so we refer 
readers there for a general discussion of the algorithm. The adjacency-matrix- 
represented graph y of size vertices is passed as the parameter g along with the 
source vertex src and two allocated arrays dist and prev. The spatial predicate 
array(x, v), which connects an array pointer x with its contents v, is standard and 
unexciting. PQ(pq, heap) is the spatial representation of our priority queue (PQ) 
and Item(i, (key, pri, data)) lays out a struct that we use to interact with the PQ; 
we leave the management of the PQ to the operations described in§ 5. Of greater 
interest is AdjMat(g, y), which as explained in §2.2, links the concrete memory 
values of g to an abstract mathematical graph y, which in turn exposes an 
interface in the language of graph theory (e.g., vertices, edges, labels). Graph + 
contains the general adjacency matrix restrictions given in §2.1 along with some 
further Dijkstra-specific restrictions to be explained in §3.2. We verify Dijkstra 
three times using different adjacency-matrix representations as explained in §2.2. 
Thanks to some careful engineering, the C code and the Coq verification are both 
almost completely agnostic to the form of representation. The only variation 
between implementations is when reading a cell (line 15), so we refactor this out 
into a straightforward helper method and verify it separately; accordingly, the 
proof bases for the three variants differ by less than 1%. 
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1 void dijkstra (int ¥**g, int src, int *dist, 
2 int *prev, int size, int inf { 
3 // { AdjMat(g, y) * array(dist, _) x array(prev, -) A src € y A connected(y, src) } 


3 

4 Item* temp = (Item*) mallocN(sizeof (Item)); 

5 int* keys = mallocN (size * sizeof (int)); 

6 PQ* pq = pq_make(size); int i, u, cost; 

7 for (i = 0; i < size; i++) 

a { dist[{i] = inf; prev[i] = inf; keys[i] = pq_push(pq,inf,i); } 
ə dist[src]= 0; prev[src]= src; pq_edit_priority(pq,keys[src] ,0); 


10 while (pq_size(pq) > 0) { 
Adist, prev, popped, heap. AdjMat(g, y) * PQ(pq, heap) « Item(temp, _) * 
array(dist, dist) x array(prev, prev) x array(keys, keys) ^ 
/I linked_correctly(y, heap, keys, dist, popped) ^ 
dijk_correct(y, src, popped, prev, dist) 
12 pq_pop(pq, temp); u = temp->data; 
13 for (i = 0; i < size; i++) { 
Adist’, prev’, heap’. AdjMat(g, y) * PQ(pq, heap’) x 
array(dist, dist’) x array(prev, prev’) x array(keys, keys) * 
14 // < Item(temp, (keys[u], dist [u],u)) A min(dist [u], heap’) A 
linked _correctly(y, heap’, keys, dist’, popped © {u}) A 
dijk_correct_weak(y, src, popped W {u}, prev’, dist’, i, u) 


15 cost = getCell(g, u, i); 

16 if (cost < inf) { 

17 if (dist[i] > dist[u] + cost) { 

18 dist[i] = dist[u] + cost; prev[i] = u; 
19 pq_edit_priority(pq, keys[i], dist[i]); 


Adist”, prev”. AdjMat(g, y) x PQ(pq, Ø) x Item(temp, _) * 
20 $}}3 // 4 array(dist, dist”) x array(prev, prev’) x array(keys, keys) ^ 
Vdst. dst € y — inv_popped(y, src, y. V , prev”, dist” , dst) 
21 freeN (temp); pq_free (pq); freeN (keys); return; } 


Fig. 1. C code and proof sketch for Dijkstra’s algorithm. 


Dijkstra’s algorithm uses a PQ to greedily choose the cheapest unoptimized 
vertex on line 12. The best-known distances to vertices are expected to improve 
as various edges are relaxed, and such improvements need to be logged in the 
PQ: Dijkstra’s algorithm implicitly assumes that its PQ supports the additional 
operation decrease_priority. Our “advanced” PQ (§5.3) supports this opera- 
tion in logarithmic time with the pq-edit_priority function. 

The first nine lines are standard setup. The keys array, assigned on line 8, 
is thereafter a mathematical constant. The pure predicate linked_correctly con- 
tains the plumbing connecting the various mathematical arrays. The verifica- 
tion turns on the loop invariants on lines 11 and 14. The pure while invariant 


3 Because decrease_priority is relatively complex to implement, several popular 
workarounds (e.g. [12]) use simpler PQs at the cost of decreased performance. 
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dijk_correct(y, src, popped, prev, dist) essentially unfolds into: 


Vdst. dst E€ y — inv-popped (y, src, popped, prev, dist, dst) ^ 
inv_unpopped(y, src, popped, prev, dist, dst) A 
inv_unseen(y, src, popped, prev, dist, dst) 


That is, a destination vertex dst falls into one of three categories: 


Li 


inv_popped: if dst E€ popped, then dst has been fully processed, i.e., dst is 
reachable from src via a globally-optimal path p whose vertices are all in 
popped. Path p has been logged in prev and p’s cost is given in dist. 
inv_unpopped: if dst Z popped, but its known distance is less than inf, then 
dst is reachable in one step from a popped vertex mom. This route is locally 
optimal: we cannot improve the cost via an alternate popped vertex. More- 
over, prev logs mom as the best-known way to reach dst, and dist logs the 
path cost via mom as the best-known cost. 

inv_unseen: if dst ¢ popped and its known distance is inf, then there is no 
edge from any popped vertex to dst; in other words, dst is located deeper in 
the graph than has yet been explored. 


After line 12, the above invariant is no longer true: a minimum-cost item u has 
been popped from the PQ, and so the dist and prev arrays need to be updated to 
account for this pop. The for loop does exactly this repair work. Its pure invari- 
ant dijk_correct_weak(y, src, popped, prev, dist, u, i) essentially unfolds into: 


Vdst. dst € y — inv_popped(y, src, popped, prev, dist, dst)) A 
Vdst.0 < dst < i — inv_unpopped(y, src, popped, prev, dist, dst) ^ 


inv_unseen(y, src, popped, prev, dist, dst)) A 


(Vdst. i < dst < size — inv_unpopped_weak(y, src, popped, prev, dist, dst, u) ^ 


inv_unseen_weak(7y, src, popped, prev, dist, dst, u)) 


We now have five cases, many of which are familiar from dijk_correct: 


1. 


inv_popped: as before; if dst € popped, then it has been fully processed. 
For all “previously-popped vertices” (i.e., except for u), this is trivial from 
dijk_correct. For u itself, we reach the heart of Dijkstra’s correctness: the 
locally-optimal path to the cheapest unpopped vertex is globally optimal. 
inv_unpopped (less than i): as before; if dst is reachable in one hop from a 
popped vertex mom, where now mom could be u. Initially this is trivial since 
i = 0, and we restore it as 7 increments by updating costs when they can be 
improved, as on lines 18 and 19. 

inv_unseen (less than i): as before; some previously unseen neighbors of u 
may be transferred to unpopped status. This is also restored as 7 increments. 
inv_unpopped_weak (between i and size): if dst is reachable in one hop from 
a previously-popped vertex mom, with potentially further improvements pos- 
sible via u. As į increments, we strengthen it into inv_unpopped after consid- 
ering whether routing via u improves the best-known cost to dst. 
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5. inv_unseen_weak (between i and size): no edge exists from any previously- 
popped vertex to dst, but there may be one from u. As i increments, we 
consider whether routing via u reveals a path to dst. This is strengthened 
into inv_unpopped if so, and into inv_unseen if not. 


At the end of the for loop the fourth and fifth cases fall away (i = size), and 
the PQ and the dist and prev arrays finish “catching up” to the pop on line 12. 
This allows us to infer the while invariant dijk_correct, and thus continue the 
while loop. The while loop itself breaks when all vertices have been popped and 
processed. The second and third clauses of the while loop invariant dijk_correct 
then fall away, as seen on line 20: all vertices satisfy inv_popped, and are either 
optimally reachable or altogether unreachable. We are done. 


3.2 Overflow in Dijkstra’s Algorithm 


Dijkstra’s algorithm clearly cannot work when a path cost is more than INT_MAX. 


A reasonable-looking restriction is to bound edge costs by [z], since the 
longest optimal path has size — 1 links and so the most expensive possible path 
costs no more than INT_MAX. However, this has two flaws. 

First, since we are writing real code in C, rather than pseudocode in an 
idealized setting, we must reserve some concrete int value inf for “infinity”. 


Suppose we set inf = INT_MAX, and that size — 1 divides INT_MAX. Now the 


longest path can have cost (size — 1) - | sa | = INT_MAX = inf. This creates 


an unpleasant ambiguity: we cannot tell if the farthest vertex is unreachable, or 
if it is reachable with legitimate cost INT_MAX. We need to adjust our maximum 


INTMAX—1 


edge weights to leave room for inf; using | size-I 


| solves this first issue. 


Second, even though the best-known distances start at inf (see line 8) and 
only ever decrease from there, the code can overflow on lines 17 and 18. Consider 
applying Dijkstra’s algorithm on a 32-bit unsigned machine to the graph in 
Fig. 2. The size of the graph is 3 nodes, and the proposed edge-weight upper 


INTMAX—1| _ Cn | 


cae 4 = 23! — 1, for example as in the graph 


bound is | 


pictured in Fig. 2. A glance at the figure shows that the true distance from the 
source A to vertices B and C are 23! —1 and 23? —2 respectively. Both values are 
representable with 32 bits, and neither distance is inf = 23? — 1, so naively all 
seems well. Unfortunately, Dijkstra’s algorithm does not exactly work like that. 

After processing vertices A and B, 2°! — 1 and 23? — 2 are the costs reflected 
in the dist array for B and C respectively—but unfortunately vertex C is still in 
the priority queue. After vertex C is popped on line 12, we fetch its neighbors in 
the for loop; the cost from C to B (231 — 1) is fetched on line 15. On line 17 the 
currently optimal cost to B (234 — 1) is compared with the sum of the optimal 
cost to C (2°? — 2) plus the just-retrieved cost of the edge from C to B (23t — 1). 
Naively, (2°? — 2) + (231 — 1) is greater than the currently optimal cost 23! — 1, 
so the algorithm should stick with the latter. However, (23? — 2) + (2°! — 1) 
overflows, with ((23? — 2) + (23! — 1)) mod 2° = 231 — 3, which is less than 
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21T 
Fig. 2. A graph that will result in overflow on a 32-bit machine. 


231 — 1! Thus the code decides that a new cheaper path from A to B exists (in 
particular, A~+B~>C~B) and then trashes the dist and prev arrays on line 18. 

Our code uses signed int rather than unsigned int so we have undefined 
behavior rather than defined-but-wrong behavior, but the essence of the overflow 
is identical. We ensure that the “probing edge” does not overflow by restricting 


the maximum edge cost further, from BEY to | MUX). In Fig. 2, edge 


3 
pose we change the edge weights in Fig. 2 from 23! — 1 to w. Now vertex B has 
distance w and C has distance 2. w. When we remove C from the priority queue, 
the comparison on line 17 is between the known best cost to B (i.e., w) and the 
candidate best cost to B via C (i.e., 3- w = 23? — 1 = INT_MAX). There is no 
overflow, so the candidate is rejected and the code behaves as advertised. 

We fold these new restrictions into the mathematical graph y. In addition 
to the bounds discussed above, we require a few other more straightforward 
bounds: edge costs be non-negative, as is typical for Dijkstra; 4-size < INT_MAX, 
to ensure that the multiplication in the malloc on line 5 does not overflow; 
and that | "| . (size — 1) < inf, so no valid path has cost inf. These 
bounds are optimal: if the input is any less restricted, the postcondition will fail. 
The last restriction on inf is not sufficient when size = 1, so in that special 
case we further require that any (self-loop) edges cost less than inf. Whenever 


0 < 4-size < INT_MAX, the restrictions on inf are satisfiable with inf SINT MAX, 


weights should be bounded by |= = 1,431,655,765; call this value w. Sup- 


3.3 Related Work on Dijkstra in Algorithms and Formal Methods 


We were not able to find a reference that gives a robust, precise, and full descrip- 
tion of the overflow issue we describe above. Dijkstra’s original paper [22] ignores 
the issue, as do the standard textbooks Introduction to Algorithms (a.k.a. CLRS) 
by Cormen et al. [20] and Algorithm Design by Kleinberg and Tardos [38]. 
Sedgewick’s book on graph algorithms in C [66] sidesteps the overflow in line 17 
by requiring weights be in double, which does have a well-defined positive infin- 
ity value and cannot overflow in the traditional sense; Sedgewick and Wayne’s 
Algorithms textbook in Java does the same [67]. However, Sedgewick’s sidestep 
entails enduring the inevitable round-off intrinsic to floating-point arithmetic, 
and so his algorithm computes approximate optimal costs rather than exact ones. 
Sedgewick does not specify any bounds on input edge weights, and accordingly 
does not (and cannot) provide any bound on this accumulated error. Sedgewick 
is silent on how to handle an int-weighted input graph. Skiena’s Algorithm 
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Design Manual [68] contains code with exactly the bug we identify: he uses inte- 
ger weights and does not specify any bounds. To its credit, Heineman et al.’s 
Algorithms in a Nutshell [33] takes int edge weights as inputs and mentions 
overflow as a possibility. Heineman et al. hustle their way around this overflow 
by performing the arithmetic in line 17 in long. However, this cast does not 
really handle the problem in a fundamental way: if edge weights are given in 
long rather than int, then it would be necessary to cast to long long; if edge 
weights are given in long long, then Heineman’s hustle breaks as there is no big- 
ger type to which to cast. Moreover, Heineman et al. do not bound edge weights, 
so when the cumulative edge weights are too high their code fails silently. 
Chen verified Dijkstra in Mizar [15], Gordon et al. formalized the reachability 
property in HOL [29], Moore and Zhang verified it in ACL2 [53], Mange and 
Kuhn verified it in Jahob [52], Filliatre in Why3 [25], and Klasen verified it in 
KeY [37]. Liu et al. took an alternative SMT-based approach to verify a Java 
implementation of Dijkstra [51]. The most recent effort (2019) is by Lammich et 
al., working within Isabelle/HOL, although they only return the weight of the 
shortest path rather than the path itself [45]. In general the previous mechanized 
proofs on Dijkstra verify code defined within idealized formal environments, e.g. 
with unbounded integers rather than machine ints and a distinguished non- 
integer value for infinity. No previous work mentions the overflow we uncover. 


4 Minimum Spanning Trees 


Here we discuss our verifications of the classic MST algorithms Prim and 
Kruskal. Although our machine-checked proofs are about real C code, in this 
section we take a higher-level approach than we did in §3, focusing on our 
key algorithmic findings and overall experience. Accordingly, we only provide 
pseudocode for Prim’s algorithm rather than a decorated program and do not 
show any code for Kruskal’s. Our development contains our C code and formal 
proofs [1]. 


4.1 Prim’s Algorithm 


We put the pseudocode for Prim’s algorithm in Fig. 3; the code on the left-hand 
side is directly from CLRS, whereas the code on the right omits line 5 and will 
be discussed in §4.2. Note that line 12 contains an implicit call to the PQ’s 
edit_priority. Since the pseudocode only compares keys (i.e., edge weights) 
rather than doing arithmetic on them a la Dijkstra, there are no potential over- 
flows and it is reasonable to set INF to INT_MAX in C. 

Indeed, our initial verifications of C code were largely “turning the crank” 
once we had the definitions and associated lemma support for pure/abstract 
undirected graphs, forests, etc. discussed in §2.3. Accordingly, our initial con- 
tribution was a demonstration that this new graph machinery was sufficient 
to verify real code. We also proved that our extensions to CertiGraph from §2 
were generic rather than verification-specific by reusing much pure and spatial 
reasoning that had been originally developed for our verification of Dijkstra. 
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1 MST-PRIM(G,w,r): MST-NOROOT-PRIM(G,w): 

2 for each u in G.V for each u in G.V 

3 u.key = INF u.key = INF 

4 u.parent = NIL u.parent = NIL 

5 r.key = 0 

6 Q=G.V Q=G.V 

7 while Q #4 0 while Q # Ó 

8 u = EXTRACT-MIN(Q) u = EXTRACT-MIN(Q) 

9 for each v in G.Adj [u] for each v in G.Adjl[u] 
10 if v € Q and w(u,v) < v.key if v € Q and w(u,v) < v.key 
11 v.parent = u v.parent = u 

12 v.key = w(u,v) v.key = w(u,v) 


Fig. 3. Left: Prim’s algorithm from CLRS [20]. Right: the same omitting line 5. 


4.2 Prim’s Algorithm Handles Multiple Components Out 
of the Box 


Textbook discussions of Prim’s algorithm are usually limited to single-component 
input graphs (a.k.a. connected graphs), producing a minimum spanning tree. It 
is widely believed that Prim’s is not directly applicable to graphs with multiple 
components, which should produce a minimum spanning forest. For example, 
both Rozen [65] and Sedgewick et al. [66,67] leave the extension to multiple 
components as a formal exercise for the reader, whereas Kepner and Gilbert 
suggest that multiple-component graphs should be handled by first finding the 
components and then running Prim on each component [36]. 

After we completed our initial verification, a close examination of our formal 
invariants showed us that the algorithm exactly as given by standard textbooks 
will properly handle multi-component graphs in a single run. The confusion 
starts because, in a fully connected graph, any vertex u removed from the PQ 
on line 8 must have u.key < INF; i.e., u must be immediately reachable from 
the spanning tree that is in the process of being built. However, nothing in the 
code relies upon this connectedness fact! All we need is that u is the “closest 
vertex” to the “current component.” If u.key = INF and u is a minimum of the 
PQ, then it simply means that the “previous component” is done, and we have 
started spanning tree construction on a new unconnected component “rooted” 
at u, yielding a forest. The node u’s parent will remain NIL, at it was after the 
setup loop on line 4, indicating that it is the root of a spanning tree. Its key will 
be INF rather than 0, but the keys are internal to Prim’s algorithm: clients only 
get back the spanning forest as encoded in the parent pointers’. 

Having made this discovery, we updated our proofs to support the new weaker 
precondition, which is what we currently formally verify in Coq [71]. A little fur- 
ther thought led to the realization that since Prim can handle arbitrary numbers 


4 The keys simply record the edge-weight connecting a vertex to its candidate parent; 
recall that line 12 is really a call to the PQ’s edit_priority. If a client wishes to 
know this edge weight, it can simply look up the edge in the graph. 
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of components, the initialization of the root’s key in line 5 is in fact unneces- 
sary. Accordingly, if we remove this line and the associated function argument 
r from MST-PRIM (i.e., the code on the right half of Fig. 3), the algorithm still 
works correctly. Moreover, the program invariants become simpler because we 
no longer need to treat a specified vertex (r) in a distinguished manner. Our 
formal development verifies this version of the algorithm as well [1]. 


4.3 Related Work on Prim in Algorithms and Formal Methods 


Prim’s algorithm was in fact first developed by the Czech mathematician Vojtéch 
Jarník in 1930 [35] before being rediscovered by Robert Prim in 1957 [61] and a 
third time by Edsger W. Dijkstra in 1959 [22]. Both Prim’s and Dijkstra’s treat- 
ment explicitly assumes a connected graph; although we cannot read Czech, some 
time with Google translate suggests that Jarnik’s treatment probably does the 
same. The textbooks we surveyed [20,36,38,65-68] seem to derive from Prim’s 
and/or Dijkstra’s treatment. More casual references such as Wikipedia [3] and 
innumerable lecture slides are presumably derived from the textbooks cited. We 
have not found any references that state that Prim’s algorithm without modi- 
fication applies to multi-component graphs, even when executable code is pro- 
vided: e.g., Heineman et al. provide C++ code that aligns closely with our 
C code [33], but do not mention that their code works equally well on multi- 
component graphs. Sadly, many sources promulgate the false proposition that 
modifications to the algorithm are needed to handle multi-component graphs 
(e.g., [8,36,65-67]). Likewise, we have found no reference that removes the ini- 
tialization step (line 5 in Fig. 3) from the standard algorithm. 

Prim’s algorithm has been the focus of a few previous formalization efforts. 
Guttman formalised and proved the correctness of Prim’s algorithm using Stone- 
Kleene relation algebras in Isabelle/HOL [30]. He works in an idealized formal 
environment that does not require the development of explicit data structures; 
his code does not appear to be executable. Lammich et al. provided a verification 
of Prim’s algorithm [45]. Lammich et al. also work within the idealized formal 
environment of Isabelle/HOL, but, in contrast to Guttman, develop efficient 
purely functional data structures and extract them to executable code. Both 
Guttman and Lammich explicitly require that the input graph be connected. 


4.4 Kruskal’s Algorithm 


Although Kruskal’s algorithm is sometimes presented as taking connected graphs 
and producing spanning trees, the literature also discusses the more general 
case of multi-component input graphs and spanning forests. However, Kruskal 
has only recently been the focus of formal verification efforts, partly because it 
relies on the notoriously difficult-to-verify union-find algorithm; fortunately, the 
CertiGraph project has an existing fully-verified union-find implementation that 
we can leverage [73]. Kruskal also requires a sorting function; we implemented 
heapsort as explained in §5.2. Kruskal is optimized for compact representations 
of sparse graphs, so the O(1) space cost of heapsort is a reasonable fit. 
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The primary interest of our verification of Kruskal is in our proof engineering. 
Kruskal inputs graphs as edge lists rather than adjacency matrices. In addition 
to requiring an addition to our spatial graph predicate menu, this means that 
Kruskal’s input graphs can have multiple edges between a given pair of vertices 
(i.e., a “multigraph”). Pleasingly, we can reuse most of the undirected graph 
definitions (§2.3), demonstrating that they are generic and reusable. 

Another challenge is integrating the pre-existing CertiGraph verification of 
union-find. We are pleased to say that no change was required for CertiGraph’s 
existing union-find definitions, lemmas, specifications and verification. Kruskal 
actually manipulates two graphs simultaneously: a directed graph with vertex 
labels (to store parent pointers and ranks) within union-find, and an undirected 
multigraph with edge labels (for which the algorithm is constructing a spanning 
forest). Beyond showing that CertiGraph was capable of this kind of systems- 
integration challenge, we had to develop additional lemma support to bridge the 
directed notion of “reachability,” used within the directed union-find graph to 
the undirected notion of “connectedness,” used in the MSF graph (§2.3). 


4.5 Related Work on Kruskal in Algorithms and Formal Methods 


Joseph Kruskal published his algorithm in 1956 [42] and it has appeared in 
numerous textbooks since (e.g., [20,38,66,68]). Kruskal’s algorithm is usually 
preferred over Prim’s for sparse graphs, and is sometimes presented as “the 
right choice” when confronted with multi-component graphs under the mistaken 
assumption that Prim’s first requires a component-finding initial step. 

Guttman generalized minimum spanning tree algorithms using Stone relation 
algebras [31], and provided a proof of Kruskal’s algorithm formatted in said alge- 
bras. Like in his work on Prim’s [30], Guttmann works within Isabelle/HOL and 
does not include concrete data structures such as priority-queues and union-find, 
instead capturing their action as equivalence relations in the underlying algebras. 
In Guttmann’s Kruskal paper, he mentions that his Prim paper axiomatizes the 
fact that “every finite graph has a minimum spanning forest,” which he is then 
able to prove using his Kruskal algorithm. Interestingly, our Prim verification 
needs the same fact, but we prove it directly. 

In a similar vein, Haslbeck et al. verified Kruskal’s algorithm [32] by building 
on Lammich et al.’s earlier work on Prim [45]. Like Lammich et al., Haslbeck et 
al. work within Isabelle/HOL with a focus on purely functional data structures. 

One of the stumbling blocks in verifying Kruskal’s algorithm is the need 
to verify union-find. In addition to CertiGraph [73], two recent efforts to certify 
union-find are by Charguéraud and Pottier, who also prove time complexity [14]; 
and by Filliatre [26], whose proof benefits from a high degree of automation. 


5 Verified Binary Heaps in C 


A binary heap embeds a heap-ordered tree in an array and uses arithmetic on 
indices to navigate between a parent and its left and right children [20]. In addi- 
tion to providing the standard insert and remove-min/remove-max operations 
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(depending on whether it is a min- or max-ordered heap) in logarithmic time, 
binary heaps can by upgraded to support two nontrivial operations. First, Floyd’s 
heapify function builds a binary heap from an unordered array in linear time, 
and as a related upgrade, heapsort performs a worst-case linearithmic-time sort 
using only constant additional space. Second, binary heaps can be upgraded to 
support logarithmic-time decrease- and increase-priority operations, which 
we generalize straightforwardly into edit_priority. 

Binary heaps are a good fit for our graph algorithms because Dijkstra’s 
and Prim’s algorithms need to edit priorities, and a constant-space heapsort 
is appropriate for the sparse edge-list-represented graphs typically targeted by 
Kruskal’s. The C language has poor support for polymorphic higher-order func- 
tions, and a binary heap that supports edit_priority is half as fast as a binary 
heap that does not. Accordingly, we implement binary heaps in C three times: 


Name Heap order edit_priority heapify Payload 

basic min no yes void* 

advanced min yes no int 

Kruskal max no yes int, int (i.e., unboxed) 


Priorities are of type int. The Kruskal-specific implementation is stripped down 
to the bare minimum required to implement heapsort (e.g., it does not support 
insert). We next overview these verifications in three parts: basic heap opera- 
tions, heapify and heapsort operations, and the edit_priority operation. 


5.1 The Basic Heap Operations of Insertion and Min/Max-Removal 


Because we are juggling three implementations, we take some care to factor 
our verification to maximize reuse. First, each C implementation has its own 
exchange and comparison functions that handle the nitty-gritty of the payload 
and choose between a min or max heap. The following lines are from the “basic” 
implementation, in which the “payload” (data field) is of type voids: 


5 void exch(unsigned int j, unsigned int k, Item arr[]) { 

6 int priority = arr[j].priority; void* data = arr[j].data; 

7 arr(jl]l.priority = arr[k].priority; arr[j].data = arr[k].data; 
s arr(k].priority = priority; arr[k].data = data; } 

9 int less(unsigned int j, unsigned int k, Item arr[]) { 

10 return (arr[j].priority <= arr[k].priority); } 


These C functions are specified as refinements of Gallina functions that exchange 
polymorphic data in lists and compare objects in an abstract preordered set; we 
verify them in VST after a little irksome engineering. The payoff is that the key 
heap operations, which, following Sedgewick [66], we call swim and sink, can 
use identical C code (up to alpha renaming) in all three implementations: 


11 void swim(unsigned int k, Item arr[]) { 

12 while (k > ROOT_IDX && less (k, PARENT(k), arr)) { 

13 exch(k, PARENT(k), arr); k = PARENT(k); + } 

14 void sink (unsigned int k, Item arr[], unsigned int available) { 
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15 while (LEFT_CHILD(k) < available) { 

16 unsigned j = LEFT_CHILD(k); 

17 if (j+1 < available && less(j+1, j, arr)) j++; 

18 if (less(k, j, arr)) break; exch(k, j, arr); k = j; }} 


These functions involve a number of complexities, both at the algorithms level 
and at the semantics-of-C level. At the C level, there is the potential for a rather 
subtle bug in the macros ROOT_IDX, PARENT, etc. Abstractly, these are simple: the 
root is in index 0; the children of x at roughly 2x and the parent at roughly 3, 
with +1 as necessary. The danger is thinking that because the variables are 
unsigned int, all arithmetic will occur in this domain; in fact we must force 
the associated constants into unsigned int as well: 


1 #define ROOT_IDX Ou 3| #define LEFT_CHILD(x) (2u*x)+1u 
2 #define PARENT(x) (x-1u)/2u 4|#define RIGHT_CHILD(x) 2u*(x+1u) 


A second C-semantics issue is the potential for overflow within LEFT_CHILD and 
RIGHT_CHILD (as well as the increments on line 17), and underflow within the 
PARENT macro (if x should ever be 0). To avoid this overflow, the precondi- 
tion of sink requires that when k is in bounds (i.e., k < available), then 
2-(available—1) < max_unsigned. An edge case occurs when deleting the last 
element from a heap (k = available); we then require 2- k < max_unsigned. 
At the algorithmic level, both the swim and sink functions involve nontrivial 
loop invariants; sink is complicated by the further need to support Floyd’s 
heapify, during which a large portion of the array is unordered. Accordingly, 
we build Gallina models of both functions and show that they restore heap order 
given a mostly-ordered input heap. There are two different versions of “mostly- 
ordered”. Specifically, swim uses a “bottom-up” version: 
Definition weak_heapOrdered2 (L : list A) (j : nat) : Prop := 
(V i b, i Æ j — nth_error L i = Some b — 
V a, nth_error L (parent i) = Some a —~ a < b) A 
(grandsOk L j root_idx). 


oI aw 


whereas sink uses a “top-down” version: 


9 Definition weak_heapOrdered_bounded (L:list A) (k:nat) (j:nat) := 
10 (Via, i>k— i # j —> nth_error L i = Some a —> 

íg (V b, nth_error L (left_child i) = Some b ~ a < b) A 

12 (Y c, nth_error L (right_child i) = Some c > a < c)) A 

13 (grandsOk L j k). 


The parameter j indicates a “hole”, at which the heap may not be heap-ordered; 
grandsOk bridges this hole by ordering the parent and the children of j: 

1 Definition grandsOk (L : list A) (j : nat) (k : nat) : Prop := 

2 j # root_idx — parent j > k > 

3 V gs bb, parent gs = j — nth_error L gs = Some bb — 

4 V a, nth_error L (parent j) = Some a — a < bb. 


The parameter k is used to support Floyd’s heapify: it bounds the portion of 
the list in which elements are heap-ordered (with the exception of j). The proofs 
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that the Gallina swim and sink can restore (bounded) heap-orderedness involve a 
number of edge cases, but given the above definitions go through. The invariants 
of the C versions of swim and sink are stated via the associated Gallina versions, 
thereby delegating all heap-ordering proofs to the Gallina versions. 

The insertion and remove functions we verify are in fact “non-checking” 
versions (insert_nc and remove_nc): their preconditions assume there is room 
in the heap to add or an item in the heap to remove. In the context of Dijkstra 
and Prim, these preconditions can be proven to hold. The associated verifications 
involve a little separation logic hackery (specifically, to FRAME away the “junk” 
part of the heap-array from the “live” part), but are straightforward using VST. 
We avoid the overflow issue in sink by bounding the maximum capacity of the 
heap: 4 < 12. capacity < max_unsigned; the magic number 12 comes from the 
size of the underlying data structure in C. We require users to prove this bound 
on heap creation, and thereafter handle it under the hood. 


5.2 Bottom-Up Heapify and Heapsort 


Floyd’s bottom-up procedure for constructing a binary heap in linear time, and 
using a binary heap to sort, are classics of the literature [20,66]. Happily, while 
the asymptotic bound on heap construction is nontrivial, the implementations 
of both are basically repeated calls to sink (and exchanges to remove the root): 


19 void build_heap(Item arr[], unsigned int size) { 

20 unsigned int start = PARENT(size); 

21 while(i) { sink(start, arr, size); 

22 if (start == 0) break; start--; } } 

23 void heapsort_rev(Item* arr, unsigned int size) { 

24 build_heap(arr,size); 

25 while (size > 1) { size--; 

26 exch(ROOT_IDX, size, arr); sink(ROOT_IDX, arr, size); } } 


Given that in 85.1 we already generalized the specification for sink to han- 
dle a portion of the array being unordered, the verification of these functions 
is straightforward. There is, however, the possibility of a subtle underflow on 
line 20, in the case when building an empty heap (i.e., size = 0). In turn, 
this means that heapsort_rev as given above cannot sort empty lists; in our 
“basic” implementation we strengthen the precondition accordingly, whereas 
in our “Kruskal” implementation we add a line before 24 that returns when 
size = 0. We use a max-heap for Kruskal because heapsort yields a reverse 
sorted list. 


5.3 Modifying an Element’s Priority 


To support edit-priority, each live item is associated not only with its usual int 
priority but also given a unique unsigned int “key”, generated during insert 
and returned to the client. The binary heap internally maintains a secondary 
array key_table that maps each key to the current location of the associated 
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item within the primary heap array. The client calls edit_priority by supplying 
the key for the item that it wishes to modify, which the binary heap looks up in 
the key_table to locate the item in the heap array before calling sink or swim. 
To keep everything linked together, the key_table is modified during exchange. 

To generate the keys on insert, we store a key field within each heap-item in 
the main array. These keys are initialized to 0..(capacity — 1), and thereafter 
are never modified other than when two cells are swapped during exchange. An 
invariant can then be maintained that the keys from the “live” and “junk” parts 
have no duplicates. On insertion, we “recycle” the key of the first “junk” item, 
which is by the invariant known to be appropriately fresh. 


5.4 Related Work on Binary Heaps in Algorithms and Formal 
Methods 


J. W. J. Williams published the binary heap data structure, along with heap- 
sort, in June 1964 [28]. Floyd proposed his linear-time bottom-up method to 
construct such heaps that December [27]. Since then, binary heaps, including 
Floyd’s construction and heapsort, have become a staple of the introductory 
data structure diet [20]. On the other hand, standard textbooks are surprisingly 
vague on the implementation of edit_priority [20,38,66], and completely silent 
on the generation of fresh keys during insertion. Our method above of “recycling 
keys” avoids a subtle overflow in a naive approach, and does not appear in the 
literature we examined. The naive idea is to have a global counter starting at 
0, which is then increased on each insert. Unfortunately, this is unsound: during 
(very) long runs involving both insert and remove-min, this key counter will 
overflow. Although overflow is defined in C for unsigned int, this overflow is 
fatal algorithmically: multiple live items could be assigned the same key. 

Binary heaps have been verified several times in the literature. They were 
problem 2 of the VACID-0 benchmark [49], and solved in this regard as well 
by the Why3 team [69]. These solutions did not implement bottom-up heap 
construction or edit priority. Summers verified heapsort in Viper, again without 
bottom-up heap construction [56]. Lammich verified Introsort, which includes a 
heapsort subroutine [44]. Previous formal work ignores nitty-gritty C issues such 
as the difference between signed and unsigned arithmetic. We believe we are the 
first formally verified binary heap to support edit-priority. 


6 Engineering Considerations 


Verifying real code is meaningfully harder than verifying toy implementations. 
On top of such challenges, verifying graph algorithms requires a significant 
amount of mathematical machinery: there are many plausible ways to define 
basic notions such as reachability, but not all of them can handle the challenges 
of verifying real code [72]. Moreover, we would like our mathematical, spatial, 
and verification machinery to be generic and reusable. 
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All of the above suggests that it is important to work within existing for- 
mal proof developments due a strong desire to not reinvent very large wheels 
(the existing proof bases we work with contain hundreds of thousands of lines 
of formal proof). We chose to work with the CompCert certified compiler [50]; 
the Verified Software Toolchain [4], which provides significant tactic support 
for separation logic-based deductive verification of CompCert C programs; and 
the CertiGraph framework [73], which provides much pure and spatial reason- 
ing support for verifying graph-manipulating programs within VST. We did so 
because these frameworks can handle the challenges of real code and because the 
CertiGraph included several fully verified implementations of union-find that we 
wished to reuse in our verification of Kruskal’s algorithm. 

Modular formal proof development involves major software engineering chal- 
lenges [64]. Accordingly, we took care factoring our extensions to CertiGraph into 
generic and reusable pieces. This factoring allows us to reuse machinery between 
verifications, including in the mathematical, spatial, and verification levels. So, 
e.g., we share significant pure and spatial machinery between Dijkstra, Prim, 
and Kruskal. Moreover, we maintain good separation between pure and spatial 
reasoning. So, e.g., both our Dijkstra and Prim verifications can handle multiple 
spatial variants of adjacency matrices without significant change. 

On the other hand, working within existing developments involves some chal- 
lenges, primarily in that some design decisions have been already made and are 
hard to change. Moreover, our verifications tickled numerous bugs within VST, 
including: overly-aggressive automatic entailment simplifying, poor error mes- 
sages, improper handling of C structs, and performance issues. We have been 
fortunate that the VST team has been willing to work with us to fix such bugs, 
although some work still remains. Performance remains one area of focus: for 
example, checking our verification of Kruskal with a 3.7 GHz processor and 32 gb 
of memory takes more than 22 min even after all of the generic pure and spatial 
reasoning has been checked, i.e. approximately 7s per line of C code (includ- 
ing whitespace and comments). This performance is unviable for verifying an 
industrial-sized application of equivalent difficulty: e.g., it would take 13 years 
for Coq to check the proof for 1,000,000 lines of C. Before some optimizations 
to our proof structure, the time was significantly longer still. 

Our contributions to CertiGraph include pieces that are reused repeatedly 
and pieces that are more bespoke. Below, we give a sense of both the size of our 
development (lines of formal Coq proof) and the mileage we get out of our own 
work via reuse. Items “added with +” are very similar (within 1%) of each other; 
Prim #4 is the version that does not set the root, i.e. on the right in Fig. 3. 
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Name Used LoC Name LoC 
MathAdjMat 7x 165 DijkSpecl+2+3 301 
Undirected 5x 2,139 VerifDijk1+2+4+3 3,554 
MathUAdjMat 4x 1,024 PrimSpecl+2+3+4 508 
SpaceAdjMat1+2+3 TX 499 VerifPrim1+2+3-+4 7,455 
EdgeList Graph 1x 911 KruskalSpec 302 
MathDijkGraph 3x 165 VerifKruskal 1,606 
DijkPureProof 3x 2,124 VerifHeapSort 568 
UndirectedUF 1x 183 VerifBasicBinaryHeap 777 
BinaryHeapModel ix 1,870 VerifAdvBinaryHeap 2,253 
Total (pure/spatial) 9,080 Total (verifications) 17,234 


In total we have 26,314 novel lines of Coq proof to verify 1,155 lines of C code 
divided among 12 files, including 3 variants of Dijkstra, 4 variants of Prim, 1 of 
Kruskal (which includes its heapsort), and 2 binary heaps. 


7 Concluding Thoughts: Related and Future Work 


We have already discussed work directly related to Dijkstra’s (§3.3), Prim’s 
(§4.3), and Kruskal’s (§4.5) algorithms, as well as binary heaps (§5.4). Summa- 
rizing briefly to the point of unreasonableness, our observations about Dijkstra’s 
overflow and Prim’s specification are novel, and existing formal proofs focus on 
code working within idealized environments rather than handling the real-world 
considerations that we do. We have also discussed the three formal developments 
we build upon and extend: CompCert, VST, and CertiGraph (Sect. 6). Our goal 
now is to discuss mechanized graph reasoning and verification more broadly. 


Reasoning About Mathematical Graphs. There is a 30+ year history of mech- 
anizing graph theory, beginning at least with Wong [74] and Chou [19] and 
continuing to the present day; Wang discusses many such efforts [72, §3.3]. The 
two abstract frameworks that seem closest to ours are those by Noschinski [58]; 
and by Lammich and Nipkow [45]. The latter is particularly related to our work, 
because they too start with a directed graph library and must extend it to handle 
undirected graphs so that they can verify Prim’s algorithm. 


More-Automated Verification. Broadly speaking, mechanized verification of soft- 
ware falls in a spectrum between more-automated-but-less-precise verifications 
and less-automated-but-more-precise verifications. Although VST contains some 
automation, we fall within the latter camp. In the former camp, landmark ini- 
tial separation logic [63] tools such as Smallfoot [7] have grown into Facebook’s 
industrial-strength Infer [11]. Other notable relatively-automated separation 
logic-based tools include HIP/SLEEK [17], Bedrock [18], KIV [24], VerCors [9], 
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and Viper [57]. More-automated solutions that use techniques other than sepa- 
ration logic include Boogie [6], BLAST [8], Dafny [48], and KeY [2]. In Sect. 3.3 
we discuss how some of these more-automated approaches have been applied to 
verify Dijkstra’s algorithm. Petrank and Hawblitzel’s Boogie-based verification 
of a garbage collector [60], Bubel’s KeY-based verification of the Schorr-Waite 
algorithm, and Chen et al.’s Tarjan’s strongly connected components algorithm 
in (among others) Why3 [16] are three examples of more-automated verifica- 
tion of graph algorithms. Miiller verified binomial (not binary) heaps in Viper, 
although his implementation did not support an edit-priority function [55]. The 
VOCAL project has verified a number of data structures, including binary and 
other heaps (all without edit-priority) and union-find [13]. 

We are not confident that more-automated tools would be able to repli- 
cate our work easily. We prove full functional correctness, whereas many more- 
automated tools prove only more limited properties. Moreover, our full functional 
correctness results rely upon a meaningful amount of domain-specific knowledge 
about graphs, which automated tools usually lack. Even if we restrict ourselves 
to more limited domains such as overflows, several more automated efforts did 
not uncover the overflow that we described in Sect. 3.3. The proof that certain 
bounds on edge weights and inf suffice depends on an intimate understanding of 
Dijkstra’s algorithm (in particular, that it explores one edge beyond the optimum 
paths); overall the problem seems challenging in highly-automated settings. The 
more powerful specification we discover for Prim’s algorithm in Sect. 4.2 is like- 
wise not something a tool is likely to discover: human insight appears necessary, 
at least given the current state of machine learning techniques. 

In contrast, several of the potential overflows in our binary heap might 
be uncovered by more-automated approaches, especially those related to the 
PARENT and LEFT_CHILD macros from Sect. 5.1. Although the arithmetic involves 
both addition/subtraction and multiplication/division, we suspect a tool such 
as Z3 [54] could handle it. Moreover, a sufficiently-precise tool would probably 
spot the necessity of forcing the internal constants into unsigned int. The issue 
of sound key generation described in Sect. 5.3 might be a bit trickier. On the 
one hand, unsigned int overflow is defined in C, so real code sometimes relies 
upon it. Accordingly, merely observing that the counter could overflow does not 
guarantee that the code is necessarily buggy. On the other hand, some tools 
might flag it anyway out of caution (i.e. right answer, wrong reason). 


Less-Automated Verification. Although as discussed above some more- 
automated tools have been applied to verify graph algorithms, the problem 
domain is sufficiently complex that many of the verifications discussed in Sect. 
3.3, Sect. 4.3, and Sect. 4.5 use less-automated techniques. Two basic approaches 
are popular. The “shallow embedding” approach is to write the algorithm in the 
native language of a proof assistant. The “deep embedding” approach is to write 
the algorithm in another language whose semantics has been precisely defined in 
the proof assistant. VST uses a deep embedding, and so we do too; one of VST’s 
more popular competitors in the deep embedding style is “Iris Proof Mode” [39]. 
In contrast, Lammich et al. have produced a series of results verifying a vari- 
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ety of graph algorithms using a shallow embedding (e.g., [32,43,45—47]). From 
a bird’s-eye view Lammich et al.’s work is the most related to our results in 
this paper: they verify all three algorithms we do and are able to extract fully- 
executable code, even if sometimes their focus is a bit different, e.g. on novel 
purely-functional data structures such as a priority queue with edit_priority. 


Pen-and-Paper Verification of Graph Algorithms. We use separation logic [63] 
as our base framework. Initial work on graph algorithms in separation logic was 
minimal; Bornat et al. is an early example [10]. Hobor and Villard developed 
the technique of ramification to verify graph algorithms [34], using a particular 
“star/wand” pattern to express heap update. Wang et al. later integrated rami- 
fication into VST as the CertiGraph project we use [73]. Krishna et al. [40] have 
developed a flow algebraic framework to reason about local and global proper- 
ties of flow graphs in the program heap; their flow algebra is mainly used to 
tackle local reasoning of global graphs in program heaps. Flow algebras should 
be compatible with existing separation logics; implementation and integration 
with the Iris project appears to be work in progress [41]. 

Krishna et al. are interested in concurrency [40]; Raad et al. provide another 
example of pen-and-paper reasoning about concurrent graph algorithms [62]. 


Future Work. We see several opportunities for decreasing the effort and/or 
increasing the automation in our approach. At the level of Hoare tuples, we see 
opportunities for improved VST tactics to handle common cases we encounter in 
graph algorithms. At the level of spatial predicates, we can continue to expand 
our library of graph constructions, for example for adjacency lists. We also 
believe there are opportunities to increase modularity and automation at the 
interface between the spatial and the mathematical levels, e.g. we sometimes 
compare C pointers to heap-represented graph nodes for equality, and due to the 
nature of our representations this equality check will be well-defined in C when 
the associated nodes are present in the mathematical graph, so this check should 
pass automatically. 

We believe that more automation is possible at the level of mathematical 
graphs: for example reachability techniques based on regular expressions over 
matrices and related semirings [5,23,70]. We are also intrigued by the recent 
development of various specialized graph logics such as by Costa et al. [21] and 
hope that these kinds of techniques will allow us to simplify our reasoning. The 
key advantage of having end-to-end machine-checked examples such as the ones 
we presented above is that they guide the automation efforts by providing precise 
goals that are known to be strong enough to verify real code. 


Conclusion. We extend the CertiGraph library to handle undirected graphs 
and several flavours of graphs with edge labels, both at the pure and at the 
spatial levels. We verify the full functional correctness of the three classic graph 
algorithms of Dijkstra, Prim, and Kruskal. We find nontrivial bounds on edge 
costs and infinity for Dijkstra and provide a novel specification for Prim. We 
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verify a binary heap with Floyd’s heapify and edit_priority. All of our code 
is in CompCert C and all of our proofs are machine-checked in Coq. 
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Abstract. We introduce verification based on separation logic to Gillian, 
a multi-language platform for the development of symbolic analysis tools 
which is parametric on the memory model of the target language. Our 
work develops a methodology for constructing compositional memory 
models for Gillian, leading to a unified presentation of the JavaScript and 
C memory models. We verify the JavaScript and C implementations of the 
AWS Encryption SDK message header deserialisation module, specifically 
designing common abstractions used for both verification tasks, and find 
two bugs in the JavaScript and three bugs in the C implementation. 


1 Introduction 


Separation logic (SL) [25/40] introduced compositional program verification us- 
ing Hoare reasoning. Current analysis tools based on ideas from SL include: 
the automatic tool Infer [8]9] used inside Facebook to find lightweight bugs in 
Java/C/C++/Obj-C programs; the semi-automatic tool Verifast [26], which 
provides full verification for fragments of C and Java; the semi-automatic tool 
JaVerT [21], which provides bug-finding and verification for JavaScript (JS) pro- 
grams; and the Viper architecture , which provides a verification backend 
for multiple programming languages, including Java, Rust, and Python. Our goal 
is to introduce verification based on SL to Gillian [I9], a multi-language platform 
for symbolic analysis, integrating bug-finding and verification in the spirit of 
JaVerT and targeting many languages in the spirit of Viper. 

Gillian currently supports three types of program analysis: symbolic test- 
ing, verification and bi-abduction. In [19], the focus was on symbolic testing, 
parametrised on complete concrete and symbolic memory models of the target 
language (TL), and underpinned by a core symbolic execution engine with strong 
mathematical foundations. Gillian analysis is done on GIL, an intermediate goto 
language parametric on a set of memory actions, which describe the fundamental 
ways in which TL programs interact with their memories. To instantiate Gillian 
to anew TL, a tool developer must: (1) identify the set of the TL memory actions 
and implement the TL memory models using these actions; and (2) provide a 
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trusted compiler from the TL to GIL, which preserves the TL memory models and 
the semantics. In [19], Gillian was instantiated to JS and C, and used to find bugs 
in two real-world data-structure libraries, Buckets.js [43] and Collections-C [41]. 
Here, we introduce compositional memory models for Gillian, extend Gillian anal- 
ysis with verification based on separation logic, adapt Gillian-JS and Gillian-C 
to this compositional setting, and provide verified specifications of the JS and C 
implementations of the deserialisation module of the AWS Encryption SDK. 


The compositional Gillian memory models ($) are given by the tool developer 
for each TL instantiation. They are based on partial memories, and formulated us- 
ing core predicates and the associated consumer and producer actions. Core 
predicates describe fundamental units of TL memories: e.g., a property of a JS 
object and a C block cell. Consumers and producers, respectively, frame off and 
frame on the TL memory resource described by the core predicates. Partiality 
and frame are familiar concepts from SL [25/40[1i]. What is perhaps less familiar 
is our emphasis on negative resource: i.e., the resource known to be absent from 
the partial memory. For example, in JS, a new extensible object is known not to 
contain any property; and, in C, a freed block is known not to be in memory and 
a cell is known not to exist beyond the block bound. We introduce a methodology 
for designing Gillian compositional memory models, and apply it to JS and C ($3), 
resulting in a unexpected similarity between the two models. Our compositional 
JS memory models follow those given in work on a JS program logic [24] and the 
JaVerT tool [2I], where negative resource was essential for frame preservation, 
inspired by the use of negative resource to capture stability properties in the 
CAP concurrent separation logic [14], now used in Iris [27]. Our compositional C 
memory models are based on the complete CompCert memory model [31]. Despite 
a large body of work on separation logic for C, we were unable to find a partial 
C memory model that captures the negative resource in its entirety. The nearest 
is probably the CH20 formalism [29], which handles freed locations but not block 
bounds. Negative resource for freed locations has also been used in incorrectness 
logic [39], and for block bounds in a program logic for WebAssembly [48]. 


We build Gillian verification on top of our compositional memory models. 
In particular, using the core predicates, we design an assertion language for 
writing function specifications in separation logic and, using the consumers and 
producers, we build a fully parametric spatial entailment engine which enables 
the use of function specifications in symbolic execution. Gillian also supports 
user-defined predicates, which allow tool developers to identify the TL language 
interface familiar to code developers, and code developers to describe and prove 
properties about the particular data structures in their programs. 


We extend Gillian-JS and Gillian-C to enable verification, introducing the JS 
and C compositional memory models, and using the same trusted compilers as 
in [19]. With these instantiations, we provide functionally-correct, verified specifi- 
cations of the message header deserialisation module of the AWS Encryption SDK 
JS and C implementations ($al g$). This is stable, critical, industry-grade code 
(~200loc for JS, ~950loc for C), which uses advanced language features to manipu- 
late complex data structures. To verify this code, we create language-independent 
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predicates to capture the message header, which we then connect without modifi- 
cation to both JS and C memories, giving specifications for the module functions. 
We also build a library of associated lemmas, used for the verification of both 
implementations. The verification itself required a substantial improvement of 
the reasoning capabilities of Gillian, especially when it came to handling arrays 
of symbolic size. We discovered two bugs in the JS implementation: one a form 
of prototype poisoning, predicted theoretically in our paper on JaVerT [21]; and 
another that allowed third parties to potentially alter authenticated, non-secret 
data. We have also discovered three bugs in the C implementation: one which 
allowed some malformed headers to be parsed as correct; one over-allocation; and 
one undefined behaviour. All of these bugs have been fixed. 


2 Gillian Verification 


We introduce Gillian verification based on separation logic (92.2), extending the 
GIL execution engine presented in [19] with compositional memory models (§2.1). 


2.1 Compositional Memory Models 


GIL is a simple goto intermediate language whose syntax is given below. It is 
parametric on a set of TL memory actions, A 5 a, given per instantiation by 
the tool developer. GIL values, v € Val, contain numbers, strings, booleans, 
uninterpreted symbols (used, e.g., to represent memory locations), simple types 
(e.g., numbers, strings), function identifiers and lists of values. GIL expressions, 
e € Expr, contain values, program variables, and unary and binary operators (e.g. 
addition, list concatenation); GIL symbolic expressions, ê € Expr, are analogous 
except that symbolic variables, ĉ € £ , are used instead of program variables. 


GIL Syntax 

ve val Êi jnEN|sES|bEB|SEU|TET|fEF|T E List(Val) 

e Expr v| X | Oe | e1 Bez ê Expr v| ĉ x | ê | €1 Beg 

c E€ Cmd £ zx := e | ifgotoe i|a#:=e(e’) | x := ale) | func € Func £ f(x {e} 
x := uSym/iSym(e) | return e | fail e | vanish p € Prog = Pi(Func) 


GIL commands, c € Cmd, contain variable assignment, conditional goto, 
function call, memory actions, allocation of uninterpreted /interpreted symbols, 
function return, error termination and path cutting. A GIL function, f(x) {Zz}, 
comprises an identifier f € F, a formal parameter and a body given by a list 
of commands ¢. A GIL program is a set of GIL functions with unique identifiers. 

GIL execution is defined in terms of state models, which are parametric 
on a value set, V D Val, and a set of memory actions, A. We distinguish 
the Boolean value set, IJ C V, and refer to m € I as a context. State mod- 
els expose an interface consisting of state actions, A W Ag, where the actions 


3 The implementation supports multiple parameters. 
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Memory ACTION - SUCCESS Memory ACTION - ERROR 
cmd(p, cs, i) = x := a(e) cmd(p, cs, i) = x := a(e) 
a.eval. (—) ~ v o.a(v) ~ (o', v)’ a.eval. (—) ~ v a.a(v) ~ (o, v)" 
a’ setVarz (v) ~ o” rs o = (if r = € then E else M) 
pE (ø, cs,i) ~ (o”, cs, i+1)? pt (o, cs, i) ~ (a', cs yew) 


Fig. 1: GIL Execution Semantics: Memory Actions 


Ag = {set Vary }rexU{setStore, getStore}U{evale }ceeaprU{assume, uSym, iSym} 
address store management, expression evaluation, branching, and allocation. 


Definition 1 (State Model). A state model, S(V, A) = (|S|, ea), comprises: a 
set of states o = (u, p, T) € |S|, containing a memory u, a variable store p, and 
a (satisfiable) context 7*\ and an action execution function, ea: (AW Ag) > 
|S] > V > P(|S| x V x R), with the result r € R= {S,E,M} denoting success, 
non-correctible error, or missing information error, pretty-printed o.a(v) > 
{oi vi)" lier} for all outcomes and o.a(v) ~> (ui, o;i) for a specific outcome, 
with countable I. The value set of concrete state models is the set of GIL values, 
val] : the value set of symbolic state models is the set of symbolic expressions, Expr. 


Definition 2 (GIL Execution Semantics). Given a state model S, the GIL 
execution semantics has judgements of the form: 


ri 


P H (o, cs, i)? ws (o'" cs’, j}? 


with: call stacks, cs € Calls; command indexes, i,j € N; and outcomes, o € O. 


The GIL execution semantics is standard for a goto language, except that it is 
parametrised by the memory actions. Call stacks capture function-related control 
flow, with cmd(p, cs, i) denoting the i-th command of the currently executing 
function (cf. [33] for details). Outcomes, o € O = s | N(v) | E(v) | M(v), indicate 
how the execution is to proceed: S states that it can continue; N(v) states that it 
terminated normally with return value v; and E(v) and M(v) state that it failed 
with either a non-correctible or missing information error described by v. We 
give the rules for memory action execution in Figure}1} all can be found in [83]. 


Compositional Memory Models. We move from whole-program memory 
models to compositional memory models by introducing memory core predi- 
cates, y € I’, which represent the fundamental units of the TL memory model 
(e.g., a memory cell). Core predicates take two lists of parameters, in-parameters 
(or ins), denoted v;, and out-parameters (or outs), denoted vo, such that from the 
ins we can learn the outs. This concept is similar to predicate parameter modes 


* States also include allocators (cf. [33] for details), elided to limit clutter. 
5 Note that the only satisfiable concrete context is true, meaning that concrete contexts 
can be elided and concrete states can be viewed as memory-store pairs, (u, p). 


Gillian, Part II: Real-World Verification for JavaScript and C 831 


of [37] and we use it to implement a parametric spatial entailment engine. An 
example of a core predicate is the cell assertion, x +> v, which captures a cell in 
memory at address x having value v. Its in-parameter is x, and its out-parameter 
is v, because, if we know x, we can find v by looking it up in the memory. 
With each core predicate y € I’, we associate a consumer and a producer 
memory action, denoted by cons. and prod. respectively, to obtain the set of 
predicate actions Ar = U ee picous.,, prody}, whose meaning is discussed shortly. 


Definition 3 (Compositional Memory Model). Given value set V and core 
predicate set I, a compositional memory model, M(V, T) = (|M|,Wf, ear), 
comprises: (1) a partial commutative monoid (PCMF| |M| = (|M|,e,0), where 
0 denotes the (indivisible) empty memory; (2) a well-formedness relation, Wf C 
|M| x IT, with Wf (u) denoting that memory wu is well-formed in (satisfiable) 
context 7; and (3) a predicate action execution function, eap : Arx|M|xVx I= 
P(|\M| x V x IT x R), pretty-printed p.0(v), + { (mi; vii lier} for all outcomes 
and [1.0(V)_ ~> (ui, vi)zi for a specific outcome, with countable I. The value set 
of concrete memory models is the set of GIL values, Val; the value set of symbolic 
memory models is the set of symbolic expressions, Expr. 


We discuss the most important properties that the components of compo- 
sitional memory models must satisfy; a full list is available in [33]. The PCM 
requirement is well-known from separation logic [40/11]. Well-formedness holds 
only for satisfiable contexts, and describes the separation of symbolic resource and 
any further TL-specific well-formedness criteria (cf. $). It must be monotonic 
with respect to context strengthening, compatible with the PCM composition, 
and the empty memory must be well-formed in any satisfiable context. The action 
execution function, u.a(v)s + { (mi, vi)" lier}, denotes that, in a memory p that 
is well-formed in the context 7, executing action œa with parameter v yields a 
countable number of branches characterised by the non-overlapping" | satisfiable 
contexts 7;, each of which implies 7 and makes the corresponding memory p; well- 
formed, and all of which together cover m (i.e., m = V,ez Ti). This last property 
means that memory actions do not drop paths, which is essential for verification. 

The intuition behind consumers and producers is that consumers frame off 
the core predicate resource (CPR), uniquely determined by the core predicate ins, 
and the producers frame it on. The following properties capture this intuition. 
First, we define the CPR of a core predicate y(v; - vo) as the memory resulting 
from its production in 0, which must succeed in any satisfiable context: 


T SAT => O.prod,(Vi + Vo) ~ (7(vVi + Vo), true)® A y(vi vo) £ 0. 


overloading notation for the core predicate and its resource. Moreover, we require 
that any successful production frames on the CPR: 


p.prody(vi*Vo)x ~> (p', true)>, = pl = p o y(vi- Vo) 
ê A PCM, X = (X,e,0), comprises a carrier set X (overloaded for simplicity), a partial, 
associative, and commutative composition operator e, and unit element 0. 
T Note that this requirement makes concrete memory actions deterministic. 
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and also that producers cannot return missing information errors, as they are 
meant to succeed precisely when the CPR is missing. The consumers, on the 
other hand, must succeed if and only if the CPR is present in memory: 


[.consy(Vi)a > (W, Vo) => T F u= pl evi: Vo) 
TE u= p eyvi-Vo) A Wh, (i) => u.consy(vi)r ~> (u, Yo)$ 


with the resulting context a’ having enough information to isolate the CPR] 
Interestingly, erroneous executions cannot be fully characterised in terms of 
CPR presence or absence, because of TL-specific error cases: for example, in C, 
attempting to either get or set the value of a block cell that is beyond the block 
bound raises an out-of-bounds error (cf. $). What we require instead is that 
consumed CPR can always be re-produced, that producers fail in a memory in 
which consumers succeed, and that producers succeed in a memory in which 
consumers return a missing information error (and vice versa for the latter): 

S 


u-consy(Vi)n > (W, Vo), ==> p.prody(vi + vL)a ~> (u",true)$, 
u-consy(Vi)n > (W, Yo), ==> p.prody (vi —)r ~> (u, false)£, 


M 


p.cons,,(Vi)q ~> (u, false) <=> p.prod,(v; + Vo)a ~> (u © Y(Vi + Vo), true)$, 


The properties given so far allow us, for example, to prove that well-formed 
memories cannot contain duplicated CPR. The final property below requires that 
non-missing executions of consumers and erroneous executions of producers must 
be frame-preserving, with the former formulated as follows: 


H.cons, (Va) > (W, Yo) ATEM A (T” >T) A Wh (we up) 
=> (po pf).consy(Vi)an ~ (W © Uf, Vo) rn 


where 7” effectively maintains well-formedness constraints for u, adds on further 
ones required for u è up to be defined and also isolates the consumed CPR. 
Note that neither missing executions of consumers nor successful executions of 
producers can be frame preserving, as framing on the appropriate CPR could 
result in success for the former, and a duplicated resource error for the latter. 

Using the consumers and producers, we are able to derive getter and setter 
actions, A = {get,, set. : y € I}, which perform frame-preserving CPR lookup 
and mutation, as given below. We discuss getters and setters further in in the 
context of our JS and C instantiations. 


GETTER: SUCCESS SETTER: SUCCESS 
p.cons,(Vi)x ~> (H, Vo)®, p.cons,(Vi)_ ~ (w, —)$, 
ul prod, (Vi + Vow ~ (i, true)? pu! prody(Vi + Vo)m ~> (wu, true)®, 
-gety (Vi) x A (al is) [set (Vi Vo) ~> (yu, true)®, 
GETTER: NON-SUCCESS SETTER: NON-SUCCESS 
[l.cons,(Vi)x ~> (u, false)?, rás [cons (Vi)r ~> (p, false)”, rás 
-gety (Vi)n ~> (u, false)”, H-Sety (Vi *Vo)a ~> (u, false), 


8 The 7 F ... denotes reasoning under context 7. In the concrete case, it can be ignored. 
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Compositional State Models. Compositional memory models lift to compo- 
sitional state models, in a similar way to the lifting of the complete memory 
models illustrated in [19]; see [33] for details. Here, we focus on memory action 
execution, which is lifted as follows to state action execution, given a memory 
model M(V, T) anda € Ar W A: 


ea(a, (p, p, T); V) = {((H',p, 0"), V)" | eave o (WV )t}- 


Observe how the context of the state is passed to the memory execution function, 
which may then strengthen it before passing it back to the resulting state. We 
can show that the PCM and well-formedness relation on memories lift to a PCM 
and well-formedness relation on states, and that state action execution maintains 
properties analogous to those given for memory models. 


2.2 GIL Verification 


We give an overview of Gillian verification based on separation logic (SL); see [33] 
for details. We describe GIL assertions, parameterised by the core predicates 
of the TL, define assertion satisfiability in a novel, parametric way using the 
core predicate producers, and provide a mechanism for using verified function 
specifications in GIL execution. GIL Assertion Syntax 

A compositional memory model 
with core predicates I" induces 
an SL-assertion language given on PQ € Asrt = {pAn|peA,ne I} 
the right. GIL memory assertions, pred € Pred Ê pred 5(#; + ĉo) := Pi; ..-; Pr; 
p,q E€ A, are formed using the — ee 
empty assertion, the separating conjunction, the core predicates, and user-defined 
predicates, whose names come from a dedicated set, A > 6. The empty assertion 
and the separating conjunction are standard. Core predicate assertions are lifted 
from memory core predicates. User-defined predicates, introduced by example in 
q3]and are used by tool developers to characterise the interface of the TL, and 
by code developers to describe the data structures in their programs. They have 
in- and out-parameters like core predicates, and can have multiple definitions, 
separated by a semi-colon. Assertions, P,Q € Asrt, extend memory assertions 
with pure first-order assertions, 7, conflated with Boolean symbolic expressions. 


p,q E€ A Ê emp | p* q | ylêi - êo) | lê: + Eo) 


Satisfiability. To define assertion satisfiability, we lift memory consumers and 
producers from core predicates to memory assertions, denoted by jz.consg(p) and 
p.prodg(p), and then to states and arbitrary assertions, denoted by o.consg(P) 
and o.prodg(P), using substitutions 6 : Æ ++ V (extended to symbolic expres- 
sions inductively, in the standard way) to map core predicate assertions, with 
parameters given by symbolic expressions, to the core predicates of the memory 
model, with parameters given by values. We highlight the successful base case 
of the memory assertion consumers, where the returned context requires the 
out-parameters of the assertion to match the ones found in memory: 


p.cons,,(6(é;))x ~> (w, vS, n” = (m AV,.= 0(êo))) T” SAT 


u.conso(y(êi - o)) x ~ (p', true)$, 
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and the successful consumption of an arbitrary assertion P = p A T: 


u’ conse(p) x ~> (u"", true)s,, n” H Olr) 


(u’, p, 7’).conse(p A T) ~ ((u", p, T"), true)® 


Definition 4 (Satisfiability). The satisfiability relation, stating that memory 
u and context n' satisfy assertion p Am under substitution 6, is defined by: 


WT, 0 H= pAn => O.prodo(p)true ~> (up, true)? A T'E (u = up A Tp AO(T)) 


and is lifted to states as: (u',p,7'),0 =pArT if and only if p,n’, 0 HEpAT. 


In Definition [4] the production, when successful, creates the (unique) memory 
Hp that corresponds to the resource of the assertion p, with its (unique) well- 
formedness constraints, 7p. In the concrete case, as the only allowed context is true, 
the formulation simplifies to the more intuitive O.prodg(p) > (u', true)> A A(7). 


Specifications. Gillian function specifications have the form {#, P}f(x){Q}*, 
where f is the function identifier, x is the function parameter, ĉ is the symbolic 
variable holding the value of x, P is the pre-condition, Q is the post-condition, and 
ê is the return value of the function, with the following, well-known, constraints: 


1. program variables do not appear in the pre- or the post-condition, and the 
function parameter x is accessed using the symbolic variable 7; 

2. symbolic variables that appear in a pre-condition are implicitly universally 
quantified, and can be re-used in the corresponding post-condition; and 

3. symbolic variables that appear only in a post-condition are implicitly exis- 
tentially quantified. 


We extend GIL programs with function specifications, accessible via p.specs, 
and the GIL execution semantics with rules for folding and unfolding user-defined 
predicates, as well as with a rule for calling function specifications, the success 
case of which is given below. Gillian verifies a specification {ĉ, P} f(a {QJ}? if if 
given the identity substitution Ê and a symbolic state & with store {x 4 Ê(ĉ y} 
such that ô, ĝ = P, the symbolic execution of f starting from & always terminates, 
for all final symbolic states ĝ; there exists some 6; > 6 such that Gi, 6; = E Q, and 
the corresponding return value equals 6;(é ) under the context of ĉi. We can prove 
that if Gillian verifies a specification, then its standard SL interpretation holds. 


SPEC CALL - SUCCESS 


cmd(p, cs, i) = y := e(e’) with 0 function call with substitution 6 
o.evale (—) ~ f  a.eval.: (—) ~ v’ get function id and parameter value 
{#, P} f (a){Q}* € p.specs get one of the function specifications 
0 = bjê = Vv] extend substitution with parameter value 
o.consg(P) > {(0;, true)S|jes} consume pre-condition 
jEJ select a branch 
a;-prodg (Q) ~ (0, true)" produce post-condition 
o},.set Var, (0'(ê)) ~ o assign return value 


pk (a, cs,i) ~ (o', cs, i+1) 
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Note that for this rule to succeed, the consumption of P must succeed. The rule 
is slightly simplified for presentation. First, it assumes to have the substitution 
upfront; in the implementation, we have a unification algorithm that, starting 
from the function parameter and using the consumers, learns the substitution. 
Second, it assumes that the post-condition does not introduce fresh symbolic 
variables; these are handled using allocators and added to the substitution. 


Remark. Due to space constraints, we have not been able to give the full tech- 
nical details of Gillian verification. These are available in the Gillian technical 
report [33], where we demonstrate that the overall GIL execution using composi- 
tional memory models is frame-preserving (up to the usual renaming of allocated 
memory locations) and prove a standard verification soundness result. 


3 Compositional Memory Models: JavaScript and C 


We present the compositional memory models of JS and C, giving the basic 
actions and core predicates, and some of the user-defined predicates that capture 
the intuitive interfaces of these languages. The key ideas behind compositional 
JS memory models were introduced in the JaVerT project [2IJ20]22]; we transfer 
them to Gillian. We introduce the compositional C memory models, building 
on the concrete block-offset memory model of CompCert [BI], simplifying the 
presentation [P] In doing so, we highlight a striking similarity between the JS and 
C models that is the result of our emphasis on negative resource. 

The JS and C concrete compositional memory models are made up of 
building blocks that are assigned a unique location (or identifier) from a set 
of uninterpreted symbols, £ C U: for JS, the building blocks are the extensible 
objects; for C, they are the blocks of linear memory of a given size. Each building 
block is divided into at least one component. For JS, each object has three 
components: a property table, h : S — Val, partially mapping property names 
(strings) to values; a domain, d : P(S), discussed shortly; and metadata, m : Val, 
which keeps track of internal JS properties for that object [22]. For C, each block 
has two components: the block contents k : N — Val, partially mapping offsets 
(natural numbers) to values; and a bound, n : N, discussed shortly. Finally, the 
memory units are, intuitively, the parts of the memory components that cannot 
be separated further: for JS, these are single object properties, domains, and 
metadata; for C, these are single block cells and bounds. These memory units 
directly correspond to the core predicates given in Definitions [6]and 

Compositional memory models must keep track of negative resource, which 
can come from two sources: allocation and deallocation. For JS and C, the 
negative information originating from allocation has infinite representation: in 
JS, a freshly created object is known to not have any properties; in C, a freshly 
allocated block of a given size in C is known not to have offsets beyond that size. 
This infinite information is captured, for JS, by the object domain whose meaning 


° We assume that values have the same size in memory and omit permissions. Gillian-C 
implements the full models, eliding the concurrency-related aspects of permissions. 
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is that any property not in the domain is absent, and, for C, by the block bound 
whose meaning is that any accesses beyond that bound result in a buffer overrun 
error. The negative information originating from deallocation is easier to handle, 
tracked by a dedicated uninterpreted symbol, @ € U. In JS, deallocation is at the 
unit level: only object properties are deleted. This is captured by extending the 
co-domain of property tables with Ø: that is, h : S — Valg. In C, deallocation is 
at the building-block level: only entire blocks can be deleted. This is captured by 
extending the co-domain of blocks with Ø, indicating that a block has been freed. 
Due to compositionality, any building block, component or unit can be missing. 
In the theory, we capture this either implicitly, via absence from the domain of a 
mapping (e.g., a missing object property for JS or a missing block cell for C), or 
explicitly, using the symbol L (e.g. a missing domain, metadata, or bound). 


Definition 5 (Compositional JS and C Memories). The PCMs of composi- 
tional concrete JS and C memories, |M,.| and |M-,|, are given by the sets 


we |M;| : £ —> ((S — Valg) x P(S), x Val_), 
we |M,| : £> (N= Val) x N1)ø, 


composition defined as disjoint union, and empty memory 0. The PCMs of 
compositional symbolic JS and C memories, |M,s| and |M-|, are given by the sets 


fie |M,| : Expr > ((Expr — Expr) x Expr, x Expr ,), 
Ê E€ |Me| : Expr — ((Expr — Expr) x Expr, )ø, 


with composition defined as (syntactic) disjoint union, and empty memory 9. 


In the above definition, symbolic memory models are simple liftings of the 
concrete ones. In the implementation, we employ heavy optimisation: for example, 
in Gillian-C, we have developed a complex tree representation of symbolic blocks 
inspired by [29], enabling tractable reasoning about arrays of symbolic size. 

Well-formedness of concrete memories addresses the relationship between 
positive and negative information, given for JS and C below: 


Wf*(u) = V(h,d,—) E€ codom(u). d L = > dom(h) Cd 
WF (u) £ V(k,n) € codom(u). n 4 L => dom(k) C [0,n) 


Well-formedness of symbolic memories additionally has to address separation 
of locations and separation in any other mappings with symbolic expressions 
in its domain (e.g. object properties for JS and offsets for C). We give the 
well-formedness criterion for the symbolic C memory: 


Sec pay A 7 7 ^ ~ ^ ^ 
Wfx(f) rt Atta \ 640A VAN ô< À 
Î,î edom( fi) (k,—) €codom( ji) (&, A) Ecodom (fi) 
tai’ 6,6! Edom(k),6#6! 6E€dom(k) ,AAL 


For our JS and C instantiations, the core predicates follow straightforwardly 
from the units of their memory models. 
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CCOoNSCELL - FOUND 


pL) = (k,n) k(o) =v SCONSCELL - a AFTER FREE 
k' =k \ {0} u = pil (k',n)] il’) =o =(/=7) (m AT’) SAT 
p.consCell({l, o]) ~> (u’,v)° ji.consCell cae ~ (fi, false)= an 
SCONSCELL - FOUND SCONSCELL - MISSING CELL 
jl?) = (Ra) k= All) = (kf) 
x = (i, ô] = [Č 6) (TA T’) SAT =(1=1') \ 6 ¢ dom(k) 
il i’ = A = (k, ®)] Tn E Bove > ô) (TATRA tn) SAT 
jt.consCell (fÊ, ôļ)z ~> (ADE ji.consCell (Ê, ôļ)z ~ (A, false) in, Amn 


Fig. 2: Selected rules for the consCell consumer. 


Definition 6 GS Core Predicates); JS has three core predicates, Yis € Ts: 


at location Î contains ei Ô (including Ø denoting property absence); 

— the domain predicate, domain(Î, d), which states that object at location | has 
no properties outside the finite set d; 

— the metadata predicate, metadata(I, m), which states that object at location i 
has metadata M. 


Definition 7 (C Core Predicates). C has three core predicates, yc € Te E} 
— the cell predicate, (1, 6)++ 6, which states that the cell at offset ô in the block 
at location Î contains value ô (which, this time, does not include Ø); 
— the bounds predicate, bound (i, A) , which states that any cell beyond offset ù 


in block at location i is not there; . 
— the freed predicate, l> Ø, which states that block at location | has been freed. 


We illustrate the C predicate action execution functions, ea, and ea p, respec- 
tively, with a selection of rules for the C cell-predicate consumer, consCell, given in 
Figure [2] The remaining rules, as well as the rules for their JS counterparts, ea js 
and eà ,,, can be found in the Gillian technical report [33]. With this information, 
we can define the compositional concrete and symbolic JS and C memory models. 


Definition 8 (JS Memory Models). The compositional concrete and symbolic 
JS memory models are defined, respectively, as M,,(Val, I.) = (|M,.|, WF”, ea js) 


and M,s(Expr, Ts) = (| Msl; WF”, € 3). 

Definition 9 (C Memory Models). The compositional concrete and symbolic 

C memory models are defined, respectively, as M-(Val, Tis) = (|Mc|, Wf, ea c) 

and M,(Expr, Is) = (| Mel, WFS ,€a,). 

10 Tn full C and the Gillian-C implementation, memory values may be of different sizes, 
and holes may exist between these values due to alignment restrictions. To address 


this, the implemented cell assertion carries additional information related to, e.g., 
size and type, similarly to that of [4], and there also exists a hole core predicate. 
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The getters and setters for JS and C are defined using the methodology 
described in In particular, the JS getters and setters are given by Aj, = 
{getProp, setProp, getDomain, setDomain, getMetadata, setMetadata}, and the sum- 
mary of the execution of the symbolic getProp(Î , P) getter is illustrated below: 


à a(t) = (Å å, f A À k 
7 e dom(a) ZEER., p e dom) — d=1 "x (2.9) 

no |- no Pa Missing property 
X (28) V ACP) ped—.— Ó 


Missing object 


Similarly, the C getters and setters are given by Ac = {getCell, setCell, getBound, 
setBound, getFreed, setFreed} and the summary of the execution of the symbolic 


getCell(/, ô) getter is illustrated below: 


KORLAR) 


Î € dom) —“— A) = 6 = 6 € dom(k) ——— f =1 —= x (Lô) 
o | yes |» i Missing cell 
X (fô) 2 Use after free v k(6) joi] * Buffer 
Missing block overrun 


The similarities in the two diagrams are evident, with the main difference being 
that JS getters do not throw errors, whereas C getters do. 


User-defined JS and C Predicates. Core predicates describe fundamental 
units of the TL memory model. On top, user-defined predicates build layers 
of abstraction to describe memory components and building blocks, standard 
library interfaces, all the way to complex data structures for particular code 
such as the AWS message header. Using Gillian notation, we present some of the 
JS and C user-defined predicates; in this notation: x and ^ are conflated to x, 
with automatic differentiation between spatial and pure assertiond!] predicate 
definitions are separated with a semi-colon; and logical variables are prefixed with 
the # symbol and are implicitly existentially quantified in predicate definitions. 

Gillian-JS inherits many user-defined predicates from JaVerT [21], including 
simple ones for describing JS objects and their properties, as well as advanced 
ones for specifying scoping, function closures and prototype chains. We focus 
here on the new Frozen0bject(o, proto, pvs) predicate, which describes a frozen 
objec] o with prototype proto and property-value pairs pvs. We first define the 
predicate Frozen0bjectProps(o, pvs) to grab the resource of the object properties: 


pred Frozen0bjectProps(o, pvs) : pvs = [ ]; 
pvs = [#p, #v] :: #rpvs * DataPropConst(o, #p, #v) * 
Frozen0bjectProps(o, #rpvs) ; 


where DataPropConst(o, #p, #v) states that the object o has a non-writable prop- 
erty #p with value #v. We then add information about the object prototype and 
its non-extensibility using the JSObject(o, proto, ext) predicate, and also state 
that the object has no properties other than pvs using the domain core predicate: 


11 From the separation logic literature, the pure assertions can be regarded as dotted. 
12 A JS object is frozen if it cannot be extended and all its properties are non-writable. 
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pred Frozen0bject(o, proto, pvs) 
JSObject(o, proto, false) * FrozenObjectProps(o, pvs) * 
FirstProj(pvs, #ps) * ListToSet(#ps, #pss) * domain(o, #pss) 


where FirstProj(pvs, #ps) means that the list #ps is the first projection of the 
list of pairs pvs, and ListToSet(#ps, #pss) means that the elements of the list 
#ps form the set #pss. 

Gillian-C, on the other hand, comes with user-defined predicates capturing, 
for example, arrays and blocks in memory, as well as automatically-generated 
predicates describing C structs, with support for nested structs. In particular, the 
array(b, off,c) predicate describes a contiguous fragment of a block b, starting 
from offset off, with contents described by the mathematical list c: 


array(b, off, c) : c = []; 
(b, off) -> #c * array(b, off+1, #d) * c = #c :: #d 


and the block(b, c) predicate captures an entire C block with contents c: 
block(b, c) : array(b, 0, c) * bound(b, |cl) 


In the implementation, arrays also exist as core predicates. This allows us to 
reason about arrays automatically in the symbolic memory (e.g., to split an array 
into sub-arrays), supported by our tree representation of symbolic blocks, instead 
of requiring manual application of lemmas. 

Finally, we illustrate automatically generated struct-related predicates us- 
ing the aws_byte_ cursor structure given below, which contains two fields: an 
unsigned integer len; and a nullable pointer to an array of 8-bit unsigned inte- 
gers buf. This struct is used for traversing the AWS message header (cf. g, and 
is intended to capture an array in memory that starts at buf and has length len. 


struct aws_byte_cursor { pred struct_aws_byte_cursor (cur, len, buf) 


size_t len; (cur == [#b, #o]) * ((#b, #0) -int64-> len) * 
uint8_t *buf; ((#b, #0 +p 8) -int64-> buf) * 
} is_ptr_or_null (buf) 


The generated predicate describes the struct’s layout in memory and gives basic 
typing information: it states that an aws_byte cursor, starting from the position 
given by the pointer cur, occupies 16 bytes in memory (8 + 8, given by the type 
annotation int64), with the first 8 bytes taken by len, and the second 8 bytes 
(note the pointer addition +p) taken by buf, which is either a pointer or null. 


4 AWS Encryption SDK Message Header Specification 


The encrypted data handled by the AWS Encryption SDK is stored within a 
structure called a message [3]. The message format has two versions of similar 
complexity: we verify version 1; version 2 was introduced very recently. Messages 
consist of a header, a body, and a footer. Here, we describe only the structure of 
the header, as we are verifying header deserialisation. 
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The AWS Encryption SDK message header is a sequence of bytes (buffer) 
divided into sections, as illustrated below; above each section is its length in bytes. 


1 L 1j 2 | 16 | 2 eral 2 j EDK Length j 


Version | Type | Suite ID | Message ID | EC Length | Encryption Context | EDK Count | Encrypted Data Keys 


1 4 1 4 12 16 


Content Type | Reserved Bytes | IV Length | Frame Length | IV | Authentication Tag 


Our approach is to abstract the header contents into a list and formulate pure 
predicates that describe its structure in a language-independent way. This allows 
us to then use the same abstractions as part of further, language-dependent, 
abstractions for both JS and C. Our design of the abstractions was informed by 
existing code annotations found in the implementations, which describe simple 
first-order properties of the code and, in the case of C, can also link to the 
CBMC [80] bounded model checker. However, these annotations are limited by 
the expressivity of JS and C, particularly when it comes to reflecting on the 
memory contents. Our predicates have no such limitations. 

We narrow down our exposition to the encryption context, as it illustrates well 
the language-independent and language-dependent aspects of our specification, 
and is also the section in which we discovered bugs in both implementations. 


Pure Specification of the Encryption Context. The encryption context 
(EC) is a sequence of bytes that describes a set of key-value pairs. Its structure 
is given in the diagram below. 


kLen, vLen, 


2-field element 


2-field element 
KC 2-field elements 


The first two bytes represent the number of key-value pairs, denoted by KC, 
and the rest describe the KC key-value pairs themselves. Keys and values are 
represented by sequences of bytes and, as they are of variable length, are serialised 
by first having two bytes that represent the length, followed by that many bytes 
of the actual key or value; we refer to this pattern as a field, and to a sequence of 
n fields as an n-element. Then, a key-value pair is serialised as a 2-field element, 
and all of the key-value pairs form a sequence of KC 2-field elements. 

We specify the EC by building layers of abstraction, from fields to elements to 
element sequences to the EC, each of which can either be complete, incomplete 
(partial, but with correct structure), or malformed (with incorrect structure). 
In the implementation, these are specified separately and are joined together in 
appropriate over-arching abstractions. Here, we focus on complete variants only. 

The Field(buf, pos, fld, len) predicate states that the buffer (list of bytes) buf, 
at index pos, holds a field with contents £14 (list of bytes) and total length len: 
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pred Field(buf, pos, fld, len) : len 
(0 <= pos) * (#rFL = sub(buf, pos, 2)) * pos 


UInti6(#rFL, #£L) * 
(fld = sub(buf, pos+2, #fL)) * buf 


(len 2+#fL) * (pos+len <= |buf|) field 


This predicate uses the GIL operator sub(1,s,n), which returns the sublist of list 1 
starting from index s and of length n, and also the UInt16(rn,n) predicate, which 
states that n is a 16-bit big-endian interpretation of the raw 2-byte list rn. The 
Element (buf, pos, fC, elem, len) predicate states that buffer buf at index pos holds 
a sequence of fC fields, with contents elem (a list of the appropriate field contents) 
and total length len. It is defined similarly to a standard linked-list predicate, 
with the ‘link’ being the fact that the list members are contiguous in memory: 


pred Element (buf, pos, fC, elem, len) 
(fC = 0) * (0 <= pos) * (pos <= |buf|) * (elem = [ ]) * (len = 0); 
(O < fC) * Field(buf, pos, #fld, #fL) * Element(buf, post#fL, fC-1, #rFs, 
#rL) * (elem = #fld :: #rFs) * (len = #fL+#rL) 


Next, analogously to Element, we define the Elements(buf, pos, eC, fC, elems, len) 
predicate, which states that the buffer buf, at index pos, holds a sequence of 
eC elements, each with fC fields, with contents elems (a list of the appropriate 
element contents) and of total length len. Finally, the EncryptionContext (buf, KVs) 
predicate states that the entire buffer buf is an EC with key-value pairs KVs, with 
all keys being unique: 


pred EncryptionContext (buf, KVs) : (buf = [ ]) * (KVs = [ ]); 
(#rKC = sub(buf, 0, 2)) * UInt16(#rKC, #KC) * (O < #KC) * 
Elements(buf, 2, #KC, 2, KVs, #len) * 
FirstProj(KVs, #Ks) * Unique(#Ks) * (2+#len = |buf|) 


Next, we show how this pure specification of the EC contents can be connected 
without modification to both the JS and C memories. 


Encryption Context in JS. In JS, the EC is serialised as an ArrayBuffer, 
which is a raw binary data buffer in memory, and accessed using a Uint8Array, 
which is a view on top of that ArrayBuffer starting from a given offset and of a 
given length, treating the raw data underneath as 8-bit unsigned integers. This 
Uint8 Array view is similar in function to the aws_byte_cursor C structure (cf. q3). 
Abstracting ArrayBuffer contents to lists, we connect these data structures in JS 
memory to our pure EC specification (cf. Figure B] top and centre): 


pred JSSerEC(o, EC, KVs) 
Uint8Array(o, #aBuf, #off, #len) * ArrayBuffer(#aBuf, #data) * 
(EC = sub(#data, #off, #len)) * EncryptionContext(EC, KVs) 


In JS, the EC is deserialised into a frozen JS object with prototype null, 
whose properties represent the keys and hold the values. This is done by converting 
the keys and the values to UTF-8 strings, and is specified as follows: 


pred JSDeserEC(o, KVs) : toUtf8(KVs, #sKVs) * FrozenObject(o, null, #sKVs) 


where toUtf8 converts the list KVs point-wise to strings, obtaining #sKVs. 
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Uint8Array(o, #aBuf, #off, #len) ArrayBuffer(#aBuf, #data) 


ECContents(EC, KVs), with KVs = [ kv, ..., kVxc] 


buf = [#b, #off] 


cur buf 


struct_aws_byte_cursor(cur, buf, EC) array(#b, #off, EC) 


Fig.3: Serialised Encryption Context: language-independent pure part (red; 
middle) and language-specific resource (green; JS above, C below) 


Finally, the specification of the { JSSerEC(eEC, #EC, #KVs) } 
decodeEncryptionContext function function decodeEncryptionContext (eEC) 
states that the EC deserialisation { PRE-CONDITION * JSDeserEC(ret, #KVs) } 
is performed correctly. 


Encryption Context in C. In C, the EC is serialised as a block in memory, 
and is traversed using an AWS byte cursor. Using the auto-generated predicate 
given in we define the aws_byte_cursor(cur, buf,c) predicate, stating that 
cur points to a byte cursor which has access to an array starting from buf, and 
holding contents c, making the length implicit: 


pred aws_byte_cursor(cur, buf, c) 
struct_aws_byte_cursor(cur, #len, buf) * (buf = [#b, #off]) * 
array(#b, #off, c) * (#len = |cl) 


A serialised EC can then be described as a valid byte cursor whose contents 
represent the EC key-value pairs (cf. Figure |3] centre and bottom): 


pred CSerEC(cur, buf, EC, KVs) 
aws_byte_cursor(cur, buf, EC) * EncryptionContext(EC, KVs) 


In C, the EC is deserialised into an AWS hash table, whose keys and values 
directly correspond to the key/value pairs of the EC, specified as follows, eliding 
the internal structure of the hash tables due to space constraints: 


pred CDeserEC(ht, KVs) : valid_hash_table(ht, KVs) 


The specification of the EC deserialisation function is more complex than for 
JS. In particular, the byte cursor that originally pointed to the EC ends up shifted 
to the end of the byte buffer, exposing the array underneath the CSerECc predicate. 


{ empty_hash_table(ec) * CSerEC(cur, #buf, #EC, #KVs) } 
int aws_cryptosdk_enc_ctx_deserialize( 
struct aws_hash_table *ec, struct aws_byte_cursor *cur) 
{ (ret = 0) * CDeserEC(ec, #KVs) * (#buf = [#b, #off]) * 
array(#b, #off, #EC) * aws_byte_cursor(cur, #buf +p |#ECI, []) } 
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5 AWS Encryption SDK Message Header Verification 


Using Gillian-JS and Gillian-C, together with the specifications given in we 
verify full functional correctness of the header deserialisation module of the AWS 
Encryption SDK JS [2] (~200loc) and C [I] (~950loc) implementations. In par- 
ticular, we verify that the deserialisation of a complete header is correct, and the 
deserialisation of an incomplete or a malformed header raises an appropriate error. 


Verification Effort and Performance. The JS verification took 3 person- 
months and the C verification took 2 person-months, with the latter taking 
less time because a large part of the infrastructure developed for JS could be 
re-used. We substantially improved the first-order solver of Gillian to reason 
automatically about complex operations on lists of symbolic length, first used in 
the modelling of JS ArrayBuffers and then for C dynamic arrays. We created a 
collection of language-independent predicates and lemmas about their inductive 
properties (~1.2kloc) that cover the project-specific AWS header, but also re- 
usable first-order concepts such as list element uniqueness, projections of lists 
of pairs, conversion from bytes to numbers, and conversion from raw bytes to 
strings. Similarly, we also had to create language-dependent abstractions and 
associated lemmas for the JS and C manipulation of the AWS message header 
(~1.2kloc). Finally, we had to: annotate the code with specifications and loop 
invariants, with the latter often having more than twenty components; manually 
apply lemmas to prove numerous complex entailments; and manually unfold 
user-defined predicates at times (the folding is automated) (~1.1kloc). 

On a machine with an Intel Core i7-4980HQ CPU 2.80 GHz, DDR3 RAM 
16GB, and a 256GB solid-state hard-drive running macOS, the JS verification 
takes approximately 45 seconds and the C verification takes approximately six 
minutes. The C time is longer, in part due to the larger codebase, but mainly due 
to the complexity of the implementation of the full C memory model, which is 
able to reason about arrays of symbolic size. This requires frequent satisfiability 
checks and (for the moment) branching on non-zero array size. These times could 
both be improved with the implementation of basic merging techniques. 


JS Verification: Bugs/Improvements. We discovered two bugs and improved 
one function implementation to link better with the underlying data structure. 


— In the decodeEncryptionContext function, the object representing the de- 
serialised EC originally had prototype Object.prototype which, in this case, 
due to the prototype inheritance of JavaScript, meant that if an EC key 
coincided with a property of Object.prototype, an error would be thrown 
incorrectly. This bug was predicted theoretically in [21], and has since been 
found in several real-world libraries [42], including cash and jQuery. 

— In the same function, in one of the branches the deserialised EC was returned 
non-frozen, which constituted a potential vulnerability in that third parties 
could alter non-secret, but authenticated data. 

— The readElements(eC, fC, buf, pos) function, which reads eC elements with 
fC fields from buffer buf at index pos into a JS array of arrays, was misaligned 
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with the underlying data structures. Its parameters were non-intuitive (it 
received eC- fC, buf, and pos), and used complex array operations to re-form 
the final return value. We re-implemented this function to construct the 
returned array of arrays efficiently, simplifying specification and verification, 
and our implementation was integrated into the codebase. 


JS Verification: Caveats. Our JS verification is correct up to the following 
caveats. First, as the AWS SDK JS implementation is written in TypeScript, 
we elide types to obtain JS; this could be automated, potentially generating 
predicates from the types. Next, some ES6 features, such as patterns in function 
parameters, are not yet supported by Gillian-JS; these we rewrite to ES5 Strict, 
preserving their meaning. Next, we use axiomatic specifications of the ArrayBuffer, 
DataView, and UInt8Array ES6 built-in libraries, as well as of the Object.freeze 
and Array.prototype.map built-in functions. These would ideally be accompanied 
with implementations, tested against the official Test262 test suite [I6] and verified 
against their specifications. Finally, as Gillian does not support higher-order 
reasoning, we axiomatise the toUtf8 function, passed into the deserialisation 
module as a parameter, as an injective function from raw bytes to JS strings. 


C Verification: Bugs. We discovered three bugs: one logical error; one undefined 
behaviour; and one over-allocation. 


— The deserialisation of the EC mishandled the case when there is not enough 
data to read it entirely, continuing to read the EDK instead of reporting an 
error. This allows some malformed headers to be parsed as well-formed. 

— The function aws_ byte _cursor_advance, when called with a NULL cursor 
and a length of 0, resulted in NULL + 0 being computed, which is undefined 
behaviour, although not problematic for most compilers. 

— The deserialised EC was stored using aws_ string, which extends C strings 
with certain metadata. It is implemented using a structure that includes a 
flexible array member. We discovered that string creation over-allocated this 
array by 8 bytes, because our (correct) predicate describing aws_ strings 
was not allowing the verification to go through. 


C Verification: Caveats. Our C verification is correct up to the following 
caveats. First, we do not use the aws_ byte cursor advance _nospec function, 
which advances the byte cursor, but also uses complex computation to protect 
against the Spectre bug. We instead use aws_ byte_cursor_ advance, which has 
equivalent behaviour, as our specifications are not expressive enough to capture 
this distinction. Next, we axiomatise the functions of the AWS hash tables and 
array list libraries, as their verification is of comparable complexity to the entire 
deserialisation module. Finally, the AWS allocators of the C implementation, 
which are passed into some of the functions, contain pointers to memory man- 
agement functions; this is higher-order in nature. In the verification, we assume 
those functions are malloc, calloc, and realloc. 
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6 Related Work 


The literature explores many techniques and tools for verifying JS [44[1822)21) 
and C PB3P6RSI3I7]. We describe: multi-language verification architectures; JS 
and C verification tools based on separation logic; C memory models related to 
our models; and other analyses applied to the AWS Encryption SDK. 


Multi-Language Verification Architectures. The multi-language verifica- 
tion architectures closest to Gillian are CORESTAR [6] and VIPER [36135]. Both of 
these architectures were designed to serve as verification back-ends for TLs and 
both have at their core a simple intermediate representation with a dedicated 
symbolic execution engind™] However, they work with the TL in different ways. 

In CORESTAR, TL core assertions are modelled as abstract predicates and 
memory actions as function calls. The function specifications play the role of 
our consumer and producer actions. The user also has to provide logical axioms, 
describing properties of the abstract predicates. The Gillian equivalent of these 
axioms are the implementations of the memory actions using consumers and 
producers, which can be optimised, but require understanding of the inner 
workings of Gillian. Like Gillian, CORESTAR’s symbolic execution engine is 
parametric on the underlying logical theory and can thus be used to reason 
about any memory model representable using abstract predicates. It is, however, 
unclear how efficiently this can be done. CORESTAR has been used inside the tool 
JSTAR [15], which has verified implementations of several Java design patterns 
but was not pushed to more complex Java code. In [2I], the authors observed 
that CORESTAR was not able to handle tractably even simple JS programs. 

Unlike Gillian and CORESTAR, VIPER [35]36] comes with a fixed interme- 
diate language, also called VIPER. The user encodes their memory model and 
corresponding core assertions into the memory model and assertion language of 
VIPER. A key advantage of VIPER lies in its expressive permission model, which 
includes fractional, recursive, and abstract read permissions, as well as in its 
support for custom mathematical domains, which enable users to extend VIPER 
with their own first-order theories, tailored to the data structures at hand. VIPER 
has mechanisms similar to our consumer and producer actions, called inhale and 
ethale. VIPER can reason about both sequential and concurrent programs, and 
has been used to verify programs written in Java, Go, Rust, and Python, but 
not JS and C. In fact, it is not clear to us how difficult it would be to use VIPER 
to reason about JS objects and the linear memory of C, as neither can be simply 
expressed using the static objects natively provided by VIPER. 


Semi-automatic JS and C Verification Tools. There are very few ver- 
ification tools for JS based on separation logic. For example, JAVERT [21] 
has been used to verify simple sequential data-structure algorithms. Its succes- 
sor, JAVERT 2.0 [22], provides whole-program symbolic testing, verification 
and bi-abductive reasoning [10], unified by a core symbolic execution engine. 


13 VIPER includes both a symbolic execution engine and a verification condition generator 
based on Boogie [5] for its intermediate language. 
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JAVERT 2.0 verification is more efficient than JAVERT verification, but has 
still only been applied to simple data-structure algorithms. Gillian [19] builds on 
JAVERT 2.0, taking the highly non-trivial step of designing the intermediate 
language, correctness results, and implementation to be parametric on the TL 
memory models. Despite this generalisation, Gillian substantially outperforms 
JAVERT 2.0, both for symbolic testing and for verification. 

VERIFAST [26] and the tool in [7] are prominent examples of semi-automatic 
tools that provide functionally-correct verification of C programs using separation- 
logic specifications. These tools work with C fragments and simplified memory 
models. While the tool in [7] has not been applied to real-world code, VERIFAST 
has been used to verify, e.g., an implementation of a Policy Enforcement Point 
(PEP) for Network Admission Control scenarios [88]. One difference between 
these tools and Gillian is that Gillian specifications can express negative resource, 
allowing us to differentiate missing resource errors from use-after-free errors. 
However, Verifast, unlike Gillian, supports reasoning about concurrent programs. 
There is also much work on using theorem provers to verify both sequential 
and concurrent C code using separation logic: see, for example, the DeepSpec 
project [45] and the Iris project [47], which we do not describe here. 


Related Formal C Memory Models. Our compositional C memory models 
were inspired by CompCert [32] and the CH20 formalisation of Krebbers [29]. 
In particular, our concrete C model is adapted from the complete model of 
CompCert, which supports reasoning about programs that access in-memory 
data representations. This feature is used by the AWS deserialisation algorithm, 
which reads the buffer contents at the byte-granularity. 

We present our compositional symbolic C memory model in this paper as a 
simple lifting of the concrete one. Our implementation is more complex, however, 
representing blocks as trees holding symbolic values and combining the concepts 
of memory trees and abstract values from the concrete memory model of the 
CH20 formalisation. Although not mentioned in [29], CH2O0 does keep track of 
some negative resource in that it maintains freed locations, but not block bounds. 


Analysis of the AWS Encryption SDK. Amazon has recently directed con- 
siderable effort towards the formal analyses of their codebase, with a number of 
tools incorporated into their CI pipeline. For example, the main cryptographic 
algorithms of the AWS Encryption SDK have certified implementations in the 
specification language Cryptol [I7], underpinned by SAW [I2]. These implemen- 
tations, however, have not yet been proven equivalent to the corresponding C 
implementation. In addition, the C implementation of the AWS Encryption SDK 
includes a symbolic test suite run using CBMC [80]. This implementation makes 
heavy use of the aws-c-common data-structure library, which is annotated with 
first-order assertions checked by CBMC. CBMC is a mature, industrial-strength 
tool, likely to outperform and have broader coverage than the symbolic test- 
ing of Gillian-C, with substantially fewer annotations than Gillian verification. 
However, as CBMC is a bounded model checker, it provides weaker correctness 
guarantees and is not compositional. Its expressivity is also somewhat constrained 
by the expressivity of the C runtime. For example, it does not allow reasoning 
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about the size of allocated memory. Gillian specifications have this expressivity, 
as highlighted by the discovered over-allocation bug. The subtle logical bug 
found by Gillian also demonstrates the importance of being able to express full, 
functionally-correct specifications. We believe there has been no previous analysis 
of the JS implementation of AWS Encryption SDK. 


7 Conclusions 


We have introduced compositional verification to the Gillian platform. Our work 
includes a methodology for designing compositional TL memory models, distin- 
guishing negative resource from missing resource and using the JS and C memory 
models as demonstrator examples. It also includes a novel, parametric approach 
to assertion interpretation, independent of the TL, enabling compositional use 
of function specifications in verification. We have been able to push the Gillian 
verification to self-contained, critical, real-world AWS JS and C code. The bugs 
and suggestions for code improvements that arose during this verification process 
have all been accepted by the developers and incorporated into the codebase. To 
our knowledge, this is the first time that industry-grade JS code has been fully 
verified and the first time that, in one verification platform, the same abstractions 
were used to verify industry code from languages as different as JS and C. The 
artifact accompanying this paper can be found at [34], and the entire Gillian 
development at [46]. In future, we will publish correctness results for Gillian 
verification [33], as part of an in-depth theoretical study of program correctness 
and incorrectness for symbolic testing, verification and bi-abductive reasoning 
being developed in Gillian. 
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Abstract. In this industrial case study we describe a new network 
troubleshooting analysis used by VPC REACHABILITY ANALYZER, an 
SMT-based network reachability analysis and debugging tool. Our trou- 
bleshooting analysis uses a formal model of AWS Virtual Private Cloud 
(VPC) semantics to identify whether a destination is reachable from a 
source in a given VPC configuration. In the case where there is no feasi- 
ble path, our analysis derives a blocked path: an infeasible but otherwise 
complete path that would be feasible if a corresponding set of VPC con- 
figuration settings were adjusted. 

Our blocked path analysis differs from other academic and commercial 
offerings that either rely on packet probing (e.g., TCPTRACE) or provide 
only partial paths terminating at the first component that rejects the 
packet. By providing a complete (but infeasible) path from the source to 
destination, we identify for a user all the configuration settings they will 
need to alter to admit that path (instead of requiring them to repeatedly 
re-run the analysis after making partial changes). This allows users to 
refine their query so that the blocked path is aligned with their intended 
network behavior before making any changes to their VPC configuration. 


1 Introduction 


This paper describes a new network connectivity troubleshooting analysis used 
by VPC REACHABILITY ANALYZER, a service that analyzes Amazon Web Ser- 
vices’ (AWS) Virtual Private Cloud (VPC) configurations. 

VPCs are user-configured networks of virtual compute devices and resources. 
AWS VPC offers dozens of networking components and controls to give users 
flexibility in configuring their networks. Access to these resources is logically 
isolated within virtual networks configured by the users. As VPCs grow in size 
and complexity, users can increasingly benefit from automation to identify and 
resolve misconfigurations, as well as to validate that applications maintain secu- 
rity and availability invariants through infrastructure changes. 

VPC REACHABILITY ANALYZER uses the TIROS [2] formal model of AWS 
VPC networking semantics to identify whether a destination is reachable from a 
source in a given VPC configuration. If the destination is reachable, then TIROS 
identifies a feasible path from the source to the destination, where a path is 
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a sequence of network components associated with incoming and/or outgoing 
packet header assignments (protocol, addresses, ports). The outgoing packet 
header of one component is the incoming packet header of the next component. 
Paths may also identify relevant VPC configuration details such as the specific 
routes, firewall rules, or other settings admitting the packet at each step. Each 
component in a VPC may accept or reject incoming and outgoing packet headers; 
a feasible path is a path in which every component on the path accepts both its 
incoming and outgoing packet header. 

TIROS’s analysis is static, i.e., TIROS does not inject traffic into VPC con- 
figurations, and is complete for the subset of AWS VPC semantics it supports: 
if there exists a path connecting the source and destination, TIROS will find it. 
Since 2018, TIROS has powered the commercially available Network Reachabil- 
ity assessment in AMAZON INSPECTOR [1], statically identifying ports on EC2 
Instances (virtual machines) accessible outside of their VPCs. 

In this work, we extend TIROS by introducing a new diagnostic blocked path 
analysis when there is not a feasible path, to help users understand why their 
query is infeasible. A blocked path is a path as defined above, in which at least 
one component rejects its incoming or outgoing packet, along with one or more 
blocking reasons: elements of the VPC configuration preventing one or more 
components on the path from accepting packets. The blocked path identifies a 
sufficient set of blocking reasons, such that if each were addressed the query 
would be satisfiable. 

Previous tools for connectivity diagnosis typically provide a partial path, 
up to the first component/rule that rejects the packet; in some cases those tools 
also identify a single blocking reason. Remediations based on a partial path may 
address that initial blocking reason only to discover that remediations are still 
necessary, or that the remediation may be working towards a path that the 
user ultimately will reject. Providing a complete blocked path connecting the 
source and destination allows users to ensure that their intent is aligned with 
our diagnosis before taking any corrective actions. 

Our contributions in this work are: 


1. Identifying the notion of a blocked path as a useful medium for conveying a 
network diagnosis and aligning it with a user’s intent, 

2. Demonstrating how blocked paths can be efficiently derived at scale, 

3. Describing VPC REACHABILITY ANALYZER, a commercial tool based on 
these insights. 


2 Background 


2.1 Related Works 


Many previous works have proposed network reachability diagnosis tools, includ- 
ing both widely-used industry tools and academic literature. These tools can be 
broadly divided into model-based and non-model-based approaches. 
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Non-model-based network diagnostic tools include system applications such 
as IPTRACE and TCPTRACE, commercial tools such as Cisco Packet Tracer [7], 
and academic works such as Tulip [12]. These tools trace live packets through 
a network or routing device, identifying the sequence of addresses of devices 
that accept the packet. Packet tracing tools lack visibility into the configuration 
settings that block and route packets. 

Model-based tools [2,5,6,13,16] statically analyze reachability between a 
specified source and destination in a network or routing device. Rather than 
transmitting live packets, these tools use formal methods such as constraint 
solvers to rigorously identify feasible paths. Existing model-based tools provide 
control-plane level information when there is a feasible path, but produce either 
no information for unreachable paths, or identify only the first (out of potentially 
many) reasons why a path is blocked. 

Our blocked path analysis is based on deriving minimal correction subsets 
(described below), which several previous works have proposed for general- 
purpose SAT-based error diagnosis or repair [4,8,9, 17]. 


2.2 Minimal Correction Subsets 


The blocked path analysis we describe in Sect. 3 relies on two related concepts: 
Maximal Satisfiable Subsets (MSS) and Minimal Correction Subsets (MCS), 
which we define below. Following the definitions from [14]: 


Definition 1 (MSS). S C F is a Maximal Satisfiable Subset of constraints F 
iff S is satisfiable and Vc € F \ S,S U {c} is unsatisfiable. 


Definition 2 (MCS). C C F is a Minimal Correction Subset of constraints F 
iff F \C is satisfiable and Yc € C, (F \ C) U {c} is unsatisfiable. 


The complement of an MCS, F \ MCS(F), is guaranteed to be a maximal 
satisfiable subset of F; for this reason the MCS is sometimes called the coMSS.! 

In general, the MCS and MSS are not guaranteed to be unique. There is 
a close connection between the definition of a Maximal Satisfiable Subset and 
MAXSAT [10]: The largest MSS (and therefore smallest MCS) corresponds to 
a solution to MAXSAT. Indeed, one approach for computing the MCS is to 
compute MAXxSAT and take the complement. Efficient algorithms for directly 
computing the (not necessarily smallest) MCS without computing MAXSAT are 
available and are typically much faster than computing MAXxSAT; a good survey 
of MCS algorithms including an empirical evaluation can be found in [14]. 

In constraint optimization problems, it is common to consider hard and soft 
constraints, in which only the soft constraints may be relaxed. Definition 2 
assumes that all constraints are soft, but can be easily extended to support 


1 Note that a minimal correction subset is a distinct concept from an unsatisfi- 
able core [11]. An unsatisfiable core is always unsatisfiable, but its complement 
F \ CORE(F) is not guaranteed to be satisfiable; in contrast, an MCS may or 
may not be satisfiable, but its complement is guaranteed to be satisfiable. 
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a mix of soft and hard constraints (where the MCS must contain only soft con- 
straints). In this case, the MCS is only well defined if the hard constraints are 
satisfiable. 

In Sect. 4, we will use a function COMPUTEMCS(Soft, Hard) that supports 
both hard and soft constraints. COMPUTEMCS returns a minimal correction set 
C = MCS(Soft U Hard), with C C Soft. Our implementation of COMPUTEMCS 
uses a simple binary search, similar to FastDiag [4], or Algorithm BFD from [14]. 
We add activation literals to the soft constraints to allow the underlying solver 
instance to be re-used incrementally while testing different subsets of soft con- 
straints for satisfiability. 


2.3 Network Reachability 


We use the SMT-encoding of AWS VPC network semantics previously described 
in Tiros [2]. In this section, we briefly review this graph-based encoding; we refer 
readers to [2] for more details. 

We take as input a configuration describing one or more user VPCs, and a 
user-specified reachability query, consisting of a source and destination compo- 
nent in the VPC. For example, the source of the query may be an internet gate- 
way, and the destination may be an EC2 Instance. A query may also optionally 
specify additional constraints, such as the protocol, a range of source or destina- 
tion addresses or ports for the packet, or an intermediate component that must 
(or must not) be on the path. 


Subnet 
sila haat RT] Packet Header 
A fay; protocol bv:8 
edges 
srcAdr bv:32 
EC2 Instance Network Interface Route Table 

private-ip: 10.0.1.15 T j dstAdr bv:32 
OD srcPort bv:16 
SoS S Ty, dstPort bv:16 

Network Interface Subnet Route Table Intenet 

NAT gateway private-ip: 10.0.2.26 cidr: 10.0.2.0/24 Gateway 


public-ip: 53.0.0.2 


((dstAdr # 10.0.1.15) => -edge1) 
((srcAdr # 10.0.1.15) = > 7edge2) 


Fig. 1. Simplified example symbolic graph representation of a VPC (left), with sym- 
bolic packet header consisting of bitvectors (right). Edges in the graph are associated 
with theory atoms, and are traversable only if those atoms are assigned true. Two 
example constraints, enforcing that a network interface is only accessible if the packet 
is addressed to/from that interface are shown. These constraints relate edge atoms in 
the symbolic graph to the bitvectors in the symbolic packet header to enforce AWS 
VPC semantics. 
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We encode VPC configurations as constrained symbolic graphs using the 
SMT solver MONOSAT [3], with fixed-width bitvectors representing the pro- 
tocol, port, and addressing information in a symbolic packet header. Figure 1 
shows a symbolic graph along with a packet header and example constraints. 

VPC components are represented as a nodes in the symbolic graph. Each 
component has semantics governing which packets it will accept; these seman- 
tics are encoded as constraints that restrict which edges incident to that com- 
ponent’s node are traversible, depending on the assignment of the packet header 
variables. A satisfying assignment to the full set of constraints corresponds to a 
feasible path. In such an assignment, the bitvector variable assignments provide 
the packet header(s) and the graph theory model provides a path of network 
component nodes connecting the source and destination of the user’s query. 

Some components (such as NAT gateways) transform and retransmit packets. 
TIROS supports this by unrolling the VPC configuration graph into multiple 
copies with separate packet header variables. Edges from packet-transforming 
components connect to their components in the next unrolled section of the 
graph. TIROS unrolls the graph to a sufficient depth to model the behavior of 
the components for each query. 

Query source and destination reachability is enforced with a single graph 
theory reachability predicate requiring a feasible path in the VPC configuration 
graph from the source to the destination of the query. Query restrictions requir- 
ing intermediate components are enforced using additional reachability predi- 
cates. Query restrictions requiring that a given resource not occur on a path are 
enforced by excluding that resource from the VPC configuration graph repre- 
sentation. Packet header restrictions are enforced using bitvector constraints. 

If the constraints are satisfiable, TIROS extracts a reachable path satisfying 
the query from the satisfying assignment to the constraints. In the next section, 
we will discuss how we extend TIROS to also provide diagnostic feedback in the 
case where the constraints are unsatisfiable. 


3 Blocked Paths for Network Configuration Diagnosis 


We introduce the notion of blocked path for analyzing infeasible network connec- 
tions. As shown in Fig. 2, a blocked path is an infeasible but otherwise complete 
path from a source to a destination, in which one or more edges or nodes are 
annotated with blocking reasons: configuration settings or network semantics 
that explain why that transition in the path is infeasible. 

Unlike a live packet trace, a blocked path continues past components that 
reject or redirect the packet so as to reach the user’s intended destination, poten- 
tially transiting through multiple infeasible steps along the way. 


Definition 3 (Blocked path). 


1. A blocked path is a complete (but infeasible) path from a source to a destina- 
tion in a network, satisfying the user’s query. 
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2. A blocked path is actionable: it is a path that could, with the right control 
plane configuration adjustments, be a feasible path. 

3. A blocked path identifies a sufficient set of blocking reasons (network seman- 
tics or control-plane settings) that would need to be addressed to admit the 
packet along that blocked path. This may include multiple blocking reasons 
along the path, as opposed to just the first blocking reason. 
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oy 
SG » fe +» Ch 
4 
EC2 Instance Security Group Route table Intemet 
private-ip: 10.0.1.15 (local traffic only) Gateway 
Packet Protocol: TCP Protocol: TCP 
Header Src:Adr 10.0.1.15 Sre:Adr 10.0.1.15 
DstAdr: 205.251.242.103 DstAdr: 205.251.242.103 
SrePort: 13357 SrePort: 45001 
DstPort: 443 DstPort: 443 


NO_ROUTE_TO_ DESTINATION 


t —e 


> 
© 


EC2 Instance Security Group Route table NAT gateway Route table Internet 
private-ip: 10.0.1.15 (local traffic only) private-ip: 10.0.2.26 Gateway 
public-ip: 53.0.0.2 
Packet Protocol: TCP Protocol: TCP Protocol: TCP 
Header Src:Adr 10.0.1.15 Src:Adr 10.0.2.26 Src:Adr 53.0.0.2 
DstAdr: 205.251.242.103 DstAdr: 205.251.242.103 DstAdr: 205.251.242.103 
SrePort: 13357 SrePort: 45001 SrcPort: 45001 
DstPort: 443 DstPort: 443 DstPort: 443 


Fig. 2. Two alternative blocked paths from an EC2 instance to an internet gateway. 
These blocked paths take different routes, and have different blocking reasons (shown 
in red) that explain why those paths are infeasible. In the first blocked path, there 
are two blocking reasons: the security group egress rule rejects packets destined for 
the Internet, and the internet gateway requires that the source instance must have a 
public IP address. Note that although the packet would be rejected by the security 
group, the blocked path continues past the security group to identify a complete (but 
infeasible) path to the internet gateway. The second blocked path transitions through 
an intermediate NAT gateway, which satisfies the security group rule and also has a 
public IP address. However, this path is still blocked, because the route table does not 
have an applicable route to the NAT gateway. 


Validating User Intent 


Showing a complete path from the source to destination, along with all the rele- 
vant configuration settings blocking that path, allows users to confirm that this 
course of action matches their intended network behavior before making any 
changes. However, in many cases there are multiple ways to adjust a configura- 
tion to admit a path, resulting in different blocked paths. 
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For example, Fig. 2 shows two example blocked paths to an internet gate- 
way from an EC2 instance lacking a public IP address. Our analysis might ini- 
tially produce for the user the shorter blocked path. Two remediation steps are 
required to admit this shorter path: The user must adjust the security group 
rule of the instance to admit egress packets to the public internet, and the user 
must also associate a public IP address with the source instance. Upon seeing 
the complete blocked path, the user may immediately determine that this would 
be the wrong solution for their network. 

If the proposed blocked path doesn’t match the user’s intent, we allow users 
to submit a refined query so as to generate an alternative blocked path. For 
instance, the user may specify allowed address or port ranges for the packet, or 
specify components that must or must not appear on the path. Similarly, the 
user may submit a refined query specifying that a NAT gateway must be an 
intermediate component on the path. In this case, we might produce the longer 
blocked path from Fig. 2. 


Actionable Blocked Paths 


In some cases, there may not exist any combination of VPC configuration adjust- 
ments that would allow a query to be satisfied. For example, under typical con- 
ditions in VPCs, route tables cannot be adjusted to redirect packets that are 
destined for a local address within the VPC. It is possible for users to specify 
queries that cannot be satisfied without violating this local route restriction. 

In principle, it is possible to derive a blocked path with non-user-configurable 
blocking reasons, however the resulting paths may behave in misleading or con- 
fusing ways, and in general will not be possible for users to actually achieve 
in any real configuration of their VPC. If possible, we want to ensure that the 
path contains only user-configurable blocking reasons, so that we produce an 
actionable finding for users. However, we still want to be able to provide useful 
diagnostics in cases where no actionable blocked path is possible (e.g., to explain 
to the user that the local route restriction will prevent their path). 

In Sect. 4, we describe how we determine when it is not possible to produce a 
blocked path without including non-configurable blocking reasons. In this case, 
we produce a partial path up to that first non-configurable blocking reason. 

Additionally, in some cases a user may specify a query that remains unsat- 
isfiable even if all of the network semantics in our model are relaxed. This can 
occur if the user specifies components that do not exist, or that are in isolated, 
disconnected networks (for which no relaxation of the edge constraints will admit 
a path). In this case, our blocked path analysis fails, and TIROS falls back on 
other techniques to produce diagnostic information. 

In Sect. 5 we show that in most cases, our analysis succeeds and produces an 
actionable blocked path. 
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4 Deriving Blocked Paths from Unsatisfiable Queries 


We group VPC configuration semantics into three disjoint sets of constraints: 
(U U NU H). Set U contains constraints that enforce user-configurable control- 
plane settings (such as a user-defined route or firewall rule), while set N con- 
tains non-configurable for user-visible network semantics (such as the local route 
restriction). 

Set H contains elements of the constraints that are either not user-visible 
(such as internal implementation details) or that should never be relaxed (such 
as the reachability predicate or any other constraints defined by the user’s query). 
For example, many of our constraints involve containment comparisons between 
CIDRs and bitvectors representing IP addresses. An individual CIDR compari- 
son is encoded as a fresh literal represnting the truth value of the comparison, 
along with multiple clauses that enforce the comparison semantics. The interme- 
diate clauses that enforce the comparison semantics are implementation details 
that we include in set H, ensuring they are not included in the blocking reasons. 

When a query is unsatisfiable, we derive a blocked path and corresponding 
blocking reasons from a Maximal Satisfiable Subset and Minimal Correction 
Subset of (U U N U H), with set H being treated as hard constraints that must 
not be included in the MCS. 

If possible, we want to produce an MCS containing only configurable blocking 
reasons from U. This ensures that the resulting blocked path is actionable. If we 
directly compute the MCS of the full constraint set U UN UH, with both U and 
N as soft constraints, non-configurable constraints from N may be included in 
the MCS even in cases where there exists an MCS containing only constraints 
from U. On the other hand, we still want to be able to produce an MCS in the 
case where the non-configurable and hard constraints (NUH) are, by themselves, 
unsatisfiable. 

In Algorithm 1, we resolve this by breaking the computation of the MCS into 
two steps, initially computing an MCS of N U H, and only allowing constraints 
from N into the blocking reasons if MCS(N U H) is non-empty. 

When N U H is satisfiable, Algorithm 1 produces a blocked path that only 
contains the configurable blocking reasons from U. 

Algorithm 1 constructs two correction sets, MCSy C N and MC Sy C U, 
with MCSyUMCSy a valid MCS of (UUNUH). We then extract a path p from a 
satisfying assignment to the corresponding MSS (UUNUH)\(MCSnUMC Sy). 
Finally, as shown below, we return either a complete or a partial blocked path, 
by associating blocking reasons from the MCS with nodes on that path. 

Algorithm 1 relies on two helper methods, EXTRACTPATH and BUILDPATH. 
EXTRACTPATH retrieves the satisfying theory model (a sequence of edges) for 
the query reachability predicate from a satifiable formula, using the graph theory 
in the SMT-solver MONOSAT, and associates packet header assignments with 
each step of that path from the corresponding bitvector assignments. BUILDPATH 
maps the literals of the MCS to descriptive strings representing blocking reasons, 
and associates those strings with steps on the blocked path. 
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Algorithm 1. Blocked Path Analysis 


1: function DERIVEBLOCKEDPATH(U, N, H) > Precondition: U U N U H is UNSAT. 
2: if UNSAT(H) then 


3 throw Error: No blocked path can be produced. 

4 else 

5: // Note: If NUH is SAT, then MCSy =9. 

6: MCSyn — COMPUTEMCS(N, H) 

7 // Note: (NU H)\ MCSwy is SAT; MCSy is well-defined. 
8: MCSu — COMPUTEMCS(U, (N U H) \ MCSw) 

9: p — EXTRACTPATH((U UN U H) \ (MCSwn U MCSv)) 

10: return BUILDPATH(p, MCSn, MCSv) 

11: end if 


12: end function 


We can see that MC Sn UMC Sy meets the definition of a minimal correction 
set of U U N U H by observing that: 


SAT((U U N U H \ (MCSy)) \(MCSy UMCSy)) line 8 
=> SAT((U U N U H) \ (MCSy UMCSy))) 
Ve € MCS, UNSAT((N U H) \ (MCSy \ {c}) line 6 
Ve € MCSy, UNSAT((UU N U H) \ (MCSz \ {c})) 
)) 


=> Ve € (MCSy U MCSy), UNSAT((U U N U H) \ ((MCSy U MC'Sy) \ {c} 


line 8 


If N UH is satisfiable, then MCS is empty and MC'Sy, containing only 
configurable constraints, is an MCS of (U U N U H). In this case, BUILDPATH 
constructs a complete blocked path consisting entirely of configurable blocking 
reasons. 

If NUH is unsatisfiable, then MCS is non-empty and MCSy) U MCSy 
contains at least one non-actionable constraints. In this case, the path p may 
behave unexpectedly and may not be realizable in a VPC configuration after 
adjustment. If MCS is non-empty, BUILDPATH forms the blocked path as 
above, but returns only the prefix of that blocked path up to and including the 
first edge or node associated with a non-actionable setting. 

Above, we discussed the cases where N U H is satisfiable or unsatisfiable. 
There is also a third possibility: The hard constraints H, representing the con- 
straints enforcing the user’s query or implementation details of our model, may 
by themselves be unsatisfiable. For example, H may be unsatisfiable if the user 
specifies a source and destination that are in separate, disconnected networks. 

If H is unsatisfiable, Algorithm 1 fails, and is unable to produce even a 
partial blocked path. In this case, we fall back on other techniques to provide 
useful diagnostic information for users. In practice, the typical reason that H is 
unsatisfiable is that the source and destination are in disconnected VPCs (so the 
reachability constraint is unsatisfiable). We use a static analysis pass to identify 
this case and handle it separately in our service. 
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In the case that Algorithm 1 produces a complete (resp. partial) blocked 
path, the underlying MCS algorithm guarantees that the blocked path will have 
the fewest possible number of blocking reasons from among all complete (resp. 
partial) blocked paths. In general this blocked path is not unique. 

In our implementation of Algorithm 1, the graph-based decision heuristic in 
MOoNOSAT will prioritize finding shortest-length paths in most cases, but does 
not guarantee that a shortest-length path is always found. 


5 Evaluation 


VPC REACHABILITY ANALYZER, a commercial offering available from AWS 
since December 2020, uses the blocked path analysis we have described to derive 
findings for queries between unreachable endpoints. 

To demonstrate the practical impact of this blocked path analysis, we ran- 
domly selected 1000 unreachable queries processed by VPC REACHABILITY 
ANALYZER. We executed the blocked path analysis for those queries on an 
‘m5.24xlarge’ EC2 instance using GNU Parallel [15], running Amazon Linux 
2, using MONOSAT version 1.6.0. 


100 t t t t t 
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Fig. 3. Number of blocking reasons per blocked path (among the 63% of unreachable 
queries for which the blocked path analysis produced a complete blocked path). 97% 
percent of blocked paths have three or fewer blocking reasons; 60% have just a single 
blocking reason. 


Excluding the time to complete the blocked path analysis, the average time 
required to initially determine satisfiability of the constraints was 2.1s (P50: 
1.78, P99: 7.48). The blocked path analysis was as fast or faster than the initial 
solving time, requiring 0.3s on average (P50: 0.05s, P99: 6.6 s). 

As described in Sect. 4, in some cases, the blocked path analysis can pro- 
duce only a partial path, or no results at all. Of those 1000 unreachable queries, 
63.2% resulted in complete blocked paths, 7.4% resulted in partial blocked paths, 
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and the remainder (29.4%) produced no analysis (in which case VPC REACH- 
ABILITY ANALYZER applies other techniques so that it can still provide useful 
diagnostics) .? 

As can be seen in Fig.3, most blocked paths have just one blocking reason, 
and 97% have at most three. This demonstrates that our analysis produces 
actionable, concise findings on real production data, a key requirement of a 
useful diagnosis service. 


6 Conclusion 


The blocked path analysis we have introduced provides key advantages over 
previous network diagnostic techniques. By showing users a blocked path from 
a source to a destination, we allows users the opportunity to refine their query 
such that their intended path is aligned with our analysis. Furthermore, showing 
all blocking reasons on a blocked path allows users to understand the VPC 
configuration adjustments necessary to realize a path for their query. 

Our blocked path analysis is a fully static analysis (requiring no packets to be 
injected into the network), can be computed efficiently using standard techniques 
from the formal methods literature, and is now used successfully in production 
by VPC REACHABILITY ANALYZER. 
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Abstract. This paper presents a new framework to synthesize lower- 
bounds on the worst-case cost for non-deterministic integer loops. As 
in previous approaches, the analysis searches for a metering function 
that under-approximates the number of loop iterations. The key novelty 
of our framework is the specialization of loops, which is achieved by 
restricting their enabled transitions to a subset of the inputs combined 
with the narrowing of their transition scopes. Specialization allows us 
to find metering functions for complex loops that could not be handled 
before or be more precise than previous approaches. Technically, it is 
performed (1) by using quasi-invariants while searching for the metering 
function, (2) by strengthening the loop guards, and (3) by narrowing the 
space of non-deterministic choices. We also propose a Max-SMT encoding 
that takes advantage of the use of soft constraints to force the solver look 
for more accurate solutions. We show our accuracy gains on benchmarks 
extracted from the 2020 Termination and Complexity Competition by 
comparing our results to those obtained by the LoAT system. 


1 Introduction 


One of the most important problems in program analysis is to automatically -and 
accurately— bound the cost of program’s executions. The first automated analy- 
sis was developed in the 70s [24] for a strict functional language and, since then, 
a plethora of techniques has been introduced to handle the peculiarities of the 
different programming languages (see, e.g., for Integer programs [5], for Java-like 
languages [2,19], for concurrent and distributed languages [16], for probabilistic 
programs [15,18], etc.) and to increase their accuracy (see, e.g., [10,14,21,22]). 
The vast majority of these techniques have focused on inferring upper bounds on 
the worst-case cost, since having the assurance that none execution of the pro- 
gram will exceed the inferred amount of resources (e.g., time, memory, etc.) has 
crucial applications in safety-critical contexts. On the other hand, lower bounds 
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on the best-case cost characterize the minimal cost of any program execution 
and are useful in task parallelization (see, e.g., [3,9,10]). There are a third type 
of important bounds which are the focus of this work: lower bounds on the worst- 
case cost, they bound the worst-case cost from below. Their main application 
is that, together with the upper bounds on worst-case, allow us to infer tighter 
worst-case cost bounds (when they coincide ensuring that the inferred cost is 
exact) what can be crucial in safety-critical contexts. Besides, lower bounds on 
the worst-case cost will give us families of inputs that lead to an expensive cost, 
what could be used to detect performance bugs. In what follows, we use the 
acronyms LB” and LB? to refer to worst-case and best-case lower-bounds, resp. 


State-of-the-Art in LB”. An important difference between LB” and LB? is 
that, while the best-case must consider all program runs, LB” holds for (usually 
infinite) families of the most expensive program executions. This is why the 
techniques applicable to LB? inference (e.g., [3,9,10]) are not useful for LB” 
in general, since they would provide too inaccurate (low) results. The state-of- 
the-art in LB” inference is [12,13] (implemented in the LoAT system) which 
introduces a variation of ranking functions, called metering functions, to under- 
estimate the number of iterations of simple loops, i.e., loops without branching 
nor nested loops. The core of this method is a simplification technique that allows 
treating general loops (with branchings and nested loops) by using the so-called 
acceleration: that replaces a transition representing one loop iteration by another 
rule that collects the effect of applying several consecutive loop iterations using 
the original rule. Asymptotic lower bounds are then deduced from the resulting 
simplified programs using a special-purpose calculus and an SMT encoding. 


Motivation. Our work is motivated by the limitation of state-of-the-art methods 
when, by treating each simple loop separately, a LB” bound cannot be found 
or it is too imprecise. For example, consider the interleaved loop in Fig. 1, that 
is a simplification of the benchmark SimpleMultiple.koat from the Termination 
and Complexity competition. Its transition system appears to the right (the 
transition system is like a control-flow graph (CFG) in which the transitions T 
are labeled with the applicability conditions and with the updates for the vari- 
ables, primed variables denote the updated values). In every iteration x or y can 
decrease by one, and these behaviors can interleave. The worst case is obtained 
for instance when z is decreased to 0 (x iterations) and then y is decreased to 0 
(y iterations), resulting in x + y iterations, or when y is first decreased to 1 and 
then x to —1, etc. The approach in [12,13] accelerates independently both 7; 
and 74, resulting in accelerated versions rf = z >—-lAy>OAa’=-lAy'=y 
with cost x + land7f=x>0Ay>0Aa’=a2Ay' =0 with cost y. Applying 
one accelerated version results in that the other accelerated version cannot be 
applied because of the final values of the variables. Thus, the overall knowledge 
extracted from the loop is that it can iterate x+1 or y times, whereas the precise 
LB” is x+y iterations. Our challenge for inferring more precise LB” is to devise 
a method that can handle all loop transitions simultaneously, as disconnecting 
them leads to a semantics loss that cannot be recovered by acceleration. 
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while (x >= 0 && y > 0) { 


T: =xrAy=y 


if (*) { 
= . Ta: >0Ay>0 m1:2>0Ay>0 
x=x-— 1; , 7 
i else { Ag eg Azt =xr—1 
Ay =y-1 Ay =y 
y=y- l; 
} T3iy <0 T:2<0 
} Ag’ =r Ag =a 
Ay =y Ay =y 


Fig. 1. Interleaved loop (left) and its representation as a transition system (right) 


Non-Termination and LB”. Our work is inspired by [17], which introduces the 
powerful concept of quasi-invariant to find witnesses for non-termination. A 
quasi-invariant is an invariant which does not necessarily hold on initialization, 
and can be found as in template-based verification [23]. Intuitively, when there 
is a loop in the program that can be mapped to a quasi-invariant that forbids 
executing any of the outgoing transitions of the loop, then the program is non- 
terminating. This paper leverages such powerful use of quasi-invariants and Max- 
SMT in non-termination analysis to the more difficult problem of LB” inference. 
Non-termination and LB” are indeed related properties: in both cases we need 
to find witnesses, resp., for non-terminating the loop and for executing at least 
a certain number of iterations. For LB”, we additionally need to provide such 
under-estimation for the number of iterations and search for LB” behaviors that 
occur for a class of inputs rather than for a single input instantiation (since the 
LB” for a single input is a concrete (i.e., constant) cost, rather than a parametric 
LB” function as we are searching for). Instead, for non-termination, it is enough 
to find a non-terminating input instantiation. 


Our Approach. A fundamental idea of our approach is to specialize loops in 
order to guide the search of the metering functions of complex loops, avoiding the 
inaccuracy introduced by disconnecting them into simple loops. To this purpose, 
we propose specializing loops by combining the addition of constraints to their 
transitions with the restriction of the valid states by means of quasi-invariants. 
For instance, for the loop in Fig. 1, our approach automatically narrows 7, by 
adding x > 0 (so that x is decreased until x = 0) and 74 by adding x < 0 (so that 
74 can only be applied when x = 0). This specialized loop has lost many of the 
possible interleavings of the original loop but keeps the worst case execution of 
x+y iterations. These specialized guards do not guarantee that the loop executes 
x + y iterations in every possible state, as the loop will finish immediately for 
x <0 or y < 0, thus our approach also infers the quasi-invariant x > 0 Aa < y. 
Combining the specialized guards and the quasi-invariant, we can assure that 
when reaching the loop in a valid state according to the quasi-invariant, x + y 
is a lower bound on the number of iterations of the loop, i.e., its cost. Using 
quasi-invariants that include all (invariant) inequalities syntactically appearing 
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in loop transitions might work for the case of loops with single path. However, for 
the general case, the specialized guards usually lead to essential quasi-invariants 
that do not appear in the original loop. The specialization achieved by adding 
constraints could be also applied in the context of non-termination to increase 
the accuracy of [17], as only quasi-invariants were used. Therefore, we argue that 
our work avoids the precision loss caused by the simplification in [12,13] and, 
besides, introduces a loop specialization technique that can also be applied to 
gain precision in non-termination analysis [17]. 


Contributions. Briefly, our main theoretical and practical contributions are: 


1. In Sect. 3 we introduce several semantic specializations of loops that enable 
the inference of local metering functions for complex loops by: (1) restricting 
the input space by means of automatically generated quasi-invariants, (2) 
narrowing transition guards and (3) narrowing non-deterministic choices. 

2. We propose a template-based method in Sect. 4 to automate our technique 
which is effectively implemented by means of a Max-SMT encoding. Whereas 
the use of templates is not new [6], our encoding has several novel aspects 
that are devised to produce better lower-bounds, e.g., the addition of (soft) 
constraints that force the solver look for larger lower-bound functions. 

3. We implement our approach in the LOBER system and evaluate it on bench- 
marks from the Integer Transition Systems category of the 2020 Termination 
and Complexity Competition (see Sect. 5). Our experimental results when 
compared to the existing system LoAT [12] are promising: they show further 
accuracy of LOBER in challenging examples that contain complex loops. 


2 Background 


This section introduces some notation on the program representation and recalls 
the notion of LB” we aim at inferring. 


2.1 Program Representation 


Our technique is applicable to sequential non-deterministic programs with inte- 
ger variables and commands whose updates can be expressed in linear (inte- 
ger) arithmetic. We assume that the non-determinism originates from non- 
deterministic assignments of the form “x:=nondet();”, where x is a program 
variable and nondet() can be represented by a fresh non-deterministic variable 
u. This assumption allows us to also cover non-deterministic branching, e.g., 
“if (*){..} else {..}” as it can be expressed by introducing a non-deterministic 
variable u and rewriting the code as “u:=nondet(); if (u>0){..} else {..}”. 

Our programs are represented using transition systems, in particular using 
the formalization of [17] that simplifies the presentation of some formal aspects 
of our work. A transition system (abbrev. TS) is a tuple S = (%,%,L,T,0), 
where Z is a tuple of n integer program variables, % is a tuple of integer (non- 
deterministic) variables, £ is a set of locations, T is a set of transitions, and O is 
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a formula that defines the valid input and is specified by a conjunction of linear 
constraints of the form a@-%+b00 where o € {>, <, =, >,<}. A transition is of the 
form (£, V, R) € T such that 4,7 € L, and R is a formula over g, ù and Z’ that is 
specified by a conjunction of linear constraints of the form @-+b-u+é-z'+do0 
where o € {>,<,=,>,<}, and primed variables 7’ represent the values of the 
unprimed eonreapending variables after the transition. We sometimes write R as 
R(T, u, Z), use R(T) to refer to the constraints that involve only variables Z (i.e., 
the guard), and use R(Z, %) to refer to the constraints that involve only variables 
ù and (possibly) z. W.l.o.g., we may assume that constraints involving primed 
variables are of the form x, = ā -g +b: u+ c. This is because non-determinism 
can be moved to R(£, u) — if a primed variable x; appears in any expression that 
is not of this form, we replace x; by a fresh non-deterministic variable u; in such 
expressions and add the equality z; = u;. We require that for any Z satisfying 
R(T), there are ù satisfying R(Z, u), formally 


Va. 


u. R(T) > R(z, vt) (1) 


This guarantees that for any state Z satisfying the condition, there are values for 
the non-deterministic variables u such that we can make progress. A transition 
that does not satisfy this condition is called invalid. Note that (1) does not refer 
to z’ since they are set in a deterministic way, once the values of z and ŭ are 
fixed. W.l.o.g., we assume that all coefficients and free constants, in all linear 
constraints, are integer; and that there is a single initial location lo E€ £ with no 
incoming transitions, and a single final location le with no outgoing transitions. 


Example 1. The TS graphically presented in Fig. 1 is expressed as follows, con- 
sidering that all inputs are valid (O = true): 


S= ( {x,y}, 0, {l0 £1, le}, 
{lo 41, x =xAy' =y), 
(4, h, > 0Ay>0Az'=xr-1Ay' =y), 
(li, lex <O0AL = £AyY =y), 
(41, leey SOAT =rAy'=y), 
(1,4,2>0Ay>0Ae2 =a2dy =y -— 1)}, true) 


A configuration C is a pair (,o) where € £ and o : ¥ > Z is a mapping 
representing a state. We abuse notation and use ø to refer to Atiz = o(xi), 
and also write o’ for the assignment obtained from o by renaming the variables 
to primed variables. There is a transition from (¢,0) to (@,o2) iff there is 
(L, U, R) € T such that Ju.cı Ao} ER. A (valid) trace t is a (possibly infinite) 
sequence of configurations (lo, 00), (41,01), ... such that co = O, and for each i 
there is a transition from (¢;,0;) to (€:41, 0:41). Traces that are infinite or end 
in a configuration with location e are called complete. A configuration (4,0), 
where £ Æ le, is blocking iff 


cok V RGE (2) 


(0,0) R)ET 
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A TS is non-blocking if no trace includes a blocking configuration. We assume 
that the TS under consideration is non-blocking, and thus any trace is a prefix 
of a complete one. Throughout the paper, we represent a TS as a CFG, and 
analyze its strongly connected components (SCC) one by one. An SCC is said 
to be trivial if it has no edge. 


2.2 Lower-Bounds 


For simplicity, we assume that an execution step (a transition) costs 1. Under 
this assumption, the cost of a trace t is simply its length len(t) where the length 
of an infinite trace is oo. In what follows, the set of all configurations is denoted 
by C, the set of all valid complete traces (using a transition system S) when 
starting from configuration C € C is denoted by Tracess(C), and R>o = {k € 
R | k > 0} U {oo}. For a non-empty set M C Rso, sup M is the least upper 
bound of M and inf M is the greatest lower bound of M. The worst-case cost 
of an initial configuration C is the cost of the most expensive complete trace 
starting from C and the best-case cost is the less expensive complete trace. 


Definition 1 (worst- and best-case cost). Let S be a TS. Its worst-case 
cost function wes : C —> Rso is wes(C) = sup {len(t) |t € Tracess(C)} and its 
best-case cost function bes : C — Rso is bes(C) = inf {len(t) | t € Tracess(C)}. 


Clearly, wcs and bcs are not computable. Our goal in this paper is to auto- 
matically find a lower-bound function p : Z” — R>o such that for any initial 
configuration C = (lo, 0o) we have wes(C) > p(o(Z)), i.e., it is an LB”. An LB? 
would be a function p : Z” — Rso that ensures that bces(C) > p(z) for any 
initial configuration C = (9,0). In what follows, for a function p(z), we let 
||o(Z)|| = [max(0, p(x))] to map all negative valuations of p to zero. 


Example 2. Consider the TS S = Hx}, {u}, {40, €1, le}, T, true) with transitions: 


T={t1 = (60, 41,2 > 0), 
To = (h, h, > 0A =ax-uAur>lAu<2), 
73 = (fileo r SOAT =x) } 


S contains a loop at ¢, where variable x is non-deterministically decreased by 1 
or 2. From any initial configuration Co = (£0, 00), the longest possible complete 
trace decreases x by 1 in every iteration with T2, therefore wcs (Co) = ||oo(x)||-+2 
because of the ||oo()|| iterations in £1 plus the cost of 7; and 73. The most precise 
lower bound for weg is p(x) = ||a|| +2, although p(x) = ||æ|| or p(x) = ||a — 2|| 
are also valid lower bounds. The shortest complete trace from Co decreases x 
by 2 in every iteration, so bes(Co) = || 2|| + 2. There are several valid lower 
bounds for bes(Co) like p(x) = ||§ || +2, pla) = || 5], or p(x) = 2. 


3 Local Lower-Bound Functions 


Focus on Local Bounds. Existing techniques and tools for cost analysis (e.g., [1, 
12]) work by inferring local (iteration) bounds for those parts of the TS that 
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correspond to loops, and then combining these bounds by propagating them 
“backwards” to the entry point in order to obtain a global bound. For example, 
suppose that our program consists of the following two loops: 


assert (x>0 && z>0); 
while (z > 0) { x=x+z; z—-; } 
while (x > 0) x-—-; 


where the second loop makes x iterations (when considering the value of x just 
before executing the loop), and the first loop makes z iterations and increments 
x by z in each iteration. We are interested in inferring a global function that 
describes the total number of iterations of both loops, in terms of the input values 
zo and zo. While both loops have linear complexity locally, i.e., iteration bounds 
z and a, the second one has quadratic complexity w.r.t the initial values. This 
can be inferred automatically from the local bounds z and x by inferring how the 
value of x changes in the first loop, and then rewriting x in terms of the initial 
values to e = £o + le) (e.g., by solving corresponding recurrence relations). 
Now the global cost would be e plus the cost of the first loop zọ. Rewriting the 
loop bound x as above is done by propagating it backwards to the entry point, 
and there are several techniques in the literature for this purpose that can be 
directly adopted in our setting to produce global bounds. These techniques can 
infer global bounds for nested-loops as well, given the iteration bounds of each 
loop. Thus, we focus on inferring local lower-bounds on the number of iterations 
that non-nested loops (more precisely, parts of the TS that correspond to loops) 
can make, and assume that they can be rewritten to global bounds by adopting 
the existing techniques of [1,12] (our implementation indeed could be used as a 
black-box which provides local lower-bounds to these tools). Namely, we aim at 
inferring, for each non-nested loop, a function ||p(Z)|| = [max(0, p(x))] that is a 
(local) LB” on its number of iterations, i.e., whenever the loop is reached with 
values v for the variables Z, it is possible to make at least ||(@)|| iterations. 


Loops and TSs. For ease of presentation, we first consider a special case of TSs 
in which all locations, except the initial and exit ones define loops, and Sect. 3.6 
explains how the techniques can be used for the general case. In particular, we 
consider that each non-trivial SCC consists of a single location @ and at least one 
transition, and we call it loop £. Transitions from £ to £ are called loop transitions 
and their guards are called loop guards, and transitions from £ to l Æ £ are called 
exit transitions. The number of iterations of a loop £ in a trace t is defined as the 
number of transitions from £ to £, which we refer to as the cost of loop £ as well 
(since we are assuming that the cost of transitions is always 1, see Sect. 2.2). 
The notions of best-case and worst-case cost in Definition 1 naturally extend to 
the cost of a loop £, i.e., we can ask what is the best-case and worst-case number 
of iterations of a given loop. 


Overview of the Section. The overall idea of our approach is to specialize each 
loop £, by restricting the initial values and/or adding constraints to its tran- 
sitions, such that it becomes possible to obtain a metering function for the 


870 E. Albert et al. 


specialized loop. A function that is a LB? of the specialized loop is by definition 
a LB” of loop £, as it does not necessarily hold for all execution traces but rather 
for the class of restricted ones. Technically, inferring a LB? of a (specialized) loop 
is done by inferring a metering function p [13], such that whenever the (special- 
ized) loop is reached with a state ø, it is guaranteed to make at least || p(o(Z))|| 
iterations. Besides, specialization is done in such away that the TS obtained by 
putting all specialized loops together is non-blocking, i.e., there is an execution 
that is either non-terminating or reaches the exit location, and thus the cost of 
this execution is, roughly, the sum of the costs of all (specialized) loops that are 
traversed. The rest of this section is organized as follows. In Sect. 3.1 we general- 
ize the basic definition of metering function for simple loops from [12] to general 
types of loops and explore its limitations. Then, in the following 3 sections, we 
explain how to overcome these limitations by means of the following special- 
izations: using quasi-invariants to narrow the set of input values (Sect. 3.2); 
narrowing loop guards to make loop transitions mutually exclusive and force 
some execution order between them (Sect. 3.3); and narrowing the space of non- 
deterministic choices to force longer executions (Sect. 3.4). Sect. 3.5 states the 
conditions, to be satisfied when specializing loops, in order to guarantee that the 
TS obtained by putting all specialized loops together is non-blocking. 


3.1 Metering Functions 


Metering functions were introduced by [13], as a tool for inferring a lower-bound 
on the number of iterations that a given loop can make. The definition is analogue 
to that of (linear) ranking function which is often used to infer upper-bounds on 
the number of iterations. The definition as given in [13] considers a loop with 
a single transition, and assumes that the exit condition is the negation of its 
guard. We start by generalizing it to our notion of loop. 


Definition 2 (Metering function). We say that a function pe is a metering 
function for a loop LE L, if the following conditions are satisfied 


Vz, u, T. R > pelz) — p(T) <1 for each (£, , R) € T (3) 
Vz, u, 7. R —> p(T) <0 for each (L,Y, R) € T (4) 


Intuitively, Condition (3) requires pe to decrease at most by 1 in each iteration, 
and Condition (4) requires pe to be non-positive when leaving the loop. 


Assuming (4, ø) is a reachable configuration in S, it is easy to see that loop 
£ will make at least ||pe(o(Z))|| iterations when starting from (¢,0). We require 
(£, a) to be reachable in S since we are interested only in non-blocking executions. 
Typically, we are interested in linear metering functions, i.e., of the form p(z) = 
āū- T + ao, since they are easier to infer and cover most loops in practice. Non- 
linear lower-bound functions will be obtained when rewriting these local linear 
lower-bounds in terms of the initial input at location Zo (see beginning of Sect. 3) 
and by composing nested loops (see Sect. 3.6). 
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Example 3 (Metering function). Consider the following loop on location ¢; that 
decreases x (71) until it takes non-positive values and exits to l2 (T2): 


1=(4,4,0 >0Aa =a2-1) T2 = (h, l2, £ < 0AT = x) 


The function pp, (x) = x + 1 is a valid metering function because it decreases by 
exactly 1 in 71 and becomes non-positive when 72 is applicable (x < 0 => xz+1 < 


0, Condition (3) of Definition 2). The function p% (x) = > is also metering 


because its value decreases by less than 1 when applying 7 ($ — 454 = 4 < 1) 
and becomes non-positive in T2. Even a function as pf (x) = 0 is trivially meter- 
ing, as it satisfies (3) and (4). Although all of them are valid metering func- 
tions, pc, (x) is preferable as it is more accurate (i.e., larger) and thus captures 
more precisely the number of iterations of the loop. Note that functions like 
pi (£) = 2x or pz*(x) = x + 5 are not metering because they do not verify (3) 
(because 2x — 2(x — 1) = 2 £1 for p}, ) or (4) (because x < 0 Ax+5 <0 for 


Pin) 
3.2 Narrowing the Set of Input Values Using Quasi-Invariants 


Metering functions typically exist for loops with simple loop guards. However, 
when guards involve more than one inequality they usually do not exist in a 
simple (linear) form. This is because such loops often include several exit transi- 
tions with unrelated conditions, where each one corresponds to the negation of 
an inequality of the guard. It is unlikely then that a non-trivial (linear) function 
satisfies (4) for all exit transitions. This is illustrated in the next example. 


Example 4. Consider the following loop that iterates on 41 if x >O0Ay > 0, and 
exits when x < 0 or y < 0: 


T =(h, h, r > 0Ay>0Ar'=xz-1Ay =y) 
T2 = (h, lo, < 0AT =£AY' =y) 
T3 = (h, l2, y <0Aa’ =a2Ay'=y) 


Intuitively, this loop executes x + 1 transitions, but pg, (x,y) = «+1 is not a 
valid metering function because it does not satisfy (4) for 73: y <0 A xz+1 <0. 
Moreover, no other function depending on x (e.g., 3, x — 2, etc.) will be a valid 
metering function, as it will be impossible to prove (4) for 73 only from the 
information y < 0 on its guard. The only valid metering function for this loop 
will be the trivial one pg, (x,y) = c with c < 0, which does not provide any 
information about the number of iterations of the loop. 


Our proposal to overcome the imprecision discussed above is to consider 
only a subset of the input values s.t. conditions (3,4) hold in the context of the 
corresponding reachable states. For example, the reachable states might exclude 
some of the exit transitions, i.e., it is guaranteed that they are never used, and 
then (4) is not required to hold for them. A metering function in this context is a 
LB? of the loop when starting from that specific input, and thus it is a LB” (i.e., 
not necessarily best-case) of the loop when the input values are not restricted. 
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invariants [17]. A quasi-invariant for a loop £ is a formula Qp over g such that 


VZ, u, T. Q(T) AR > Q(T") for each (0,/,R) € T (5) 
Jz. Q(z) (6) 


Intuitively, Qe is similar to an inductive invariant but without requiring it to 
hold on the initial states, i.e., once Q, holds it will hold during all subsequent 
visits to £. This also means that for executions that start in states within Q,, it 
is guaranteed that Q, is an over-approximation of the reachable states. Condi- 
tion (6) is used to avoid quasi-invariants that are false. Given a quasi-invariant 
QO, for £, we say that pg is a metering function for £ if the following holds 


Yz, uŭ, T. Q(T) AR > p(T) — pe(2’) < 1 for each (£, L R)ET (7) 
Vz, u, T. Q(T) AR > p(T) <0 for each (0,0,R)ET (8) 
Intuitively, these conditions state that (3,4) hold in the context of the states 


induced by Q¢. Assuming that (@,0) is reachable in S and that o = Qy, loop £ 
will make at least ||pe(o(z))|| iterations in any execution that starts in (4, o). 


Example 5. Recall that the loop in Example 4 only admitted trivial metering 
functions because of the exit transition 73. It is easy to see that Qe, =x < y 
verifies (5,6), because y is not modified in 7; and x decreases, and thus it is a 
quasi-invariant. In the context of Q, , function pe, (x,y) = «+1 is metering 
because when taking 73 the value of x is guaranteed to be negative, i.e., 73 
satisfies (8) because x < y ^y <0 —- 241 <0. Notice that pẹ, (x,y) =x£+1 
will still be a valid metering function considering other quasi-invariants of the 
form Q;, = y > c with c > 0, as they would completely disable transition 73. 


3.3 Narrowing Guards 


The loops that we have considered so far consist of a single loop transition, 
what makes easier to find a metering function. This is because there is only 
one way to modify the program variables (with some degree of non-determinism 
induced by the non-deterministic variables). However, when we allow several 
loop transitions, we can have loops for which a non-trivial metering function 
does not exist even when narrowing the set of input values. 


Example 6. Consider the extension of the loop in Example 4 with a new transi- 
tion 74 that decrements y (it corresponds to the example in Sect. 1): 


1=(4,4,¢>0Ay>0Aae' =a-1Ay'=y) 
Ta = (h, h, > 0Ay > 0AT =rAy =y-1) 
T2 = (l, b,£ LOAL =£AY' = y) 
T3 = (h,k, y SOAZ =£AyY' =y) 
The most precise LB” of this loop is ||pe,(a,y)|| where pe, (x,y) = x + y. As 


mentioned, this corresponds, e.g., to an execution that uses 7, until x = 0, i.e., x 
times, and then 74 until y = 0, i.e., y times. It is easy to see that if we start from 
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a state that satisfies x >O0Aa < y, then it will be satisfied during the particular 
execution that we just described. Moreover, assuming that Qe =x >O0Au<y 
is a quasi-invariant, it is easy to show that together with pẹ, we can verify (7,8), 
and thus pe, will be a metering function. However, unfortunately, Qe, is not 
a quasi-invariant since the above loop can make executions other than the one 
described above (e.g., decreasing y to 1 first and then z to 0). 


Our idea to overcome this imprecision is to narrow the set of states for 
which loop transitions are enabled, i.e., strengthening loop guards by additional 
inequalities. This, in principle, reduces the number of possible executions, and 
thus it is more likely to find a metering function (or a better quasi-invariant), 
because now they have to be valid for fewer executions. For example, this might 
force an execution order between the different paths, or even disable some tran- 
sitions by narrowing their guard to false. Again, a metering function for the 
specialized loop is not a valid LB? of the original loop, but rather its a valid 
LB” that is what we are interested in. Next, we state the requirements that 
such narrowing should satisfy. The choice of a narrowing that leads to longer 
executions is discussed in Sect. 4. 

A guard narrowing for a loop transition r € T is a formula G,(Z), over 
variables z. A specialization of a loop is obtained simply by adding these formulas 
to the corresponding transitions. Conditions (5)-(8) can be specialized to hold 
only for executions that use the specialized loop as follows. Suppose that for a 
loop £ € £ we are given a narrowing G- for each loop transition 7, then Qg and pg 
are quasi-invariant and metering function resp. for the corresponding specialized 
loop if the following conditions hold 


VE, U, Z. Qc(Z) AGT) AR > Q(T’) for each (,¢,R) € T (9) 
Az. Qe(Z) (10) 
VZ, u, T. Qe(T) \Gr(Z) AR > pe(Z) — p(T) <1 foreach (0,0,R)E€T (11) 
Yz. Q(T) A R(T) > pe(Z) < 0 for each (f,0,R)€T (12) 


Conditions (9,10) guarantee that Q, is a non-empty quasi-invariant for the spe- 
cialized loop, and conditions (11,12) guarantee that pe is a metering function 
for the specialized loop in the context of Qe. However, in this case, function pe 
induces a lower-bound on the number of iterations only if the specialized loop is 
non-blocking for states in Qy. This is illustrated in the following example. 


Example 7. Consider the loop from Example 3 where we have specialized the 
guard of 7, by adding x > 5: 


T = (h, h, > 0AL2 5A =2-1) T2 = (h, b,£ < 0A X = x) 


With this specialized guard and considering Qe, = true, the metering function 
pa (x) = x + 1 still satisfies (11,12), and Q,, trivially satisfies (9,10). However, 
pe, is not a valid measure of the number of transitions executed because the loop 
gets blocked whenever x takes values 0 < x < 5, and thus it will never execute 
x + 1 transitions. 
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To guarantee that the specialized loop is non-blocking for states in Qg, it is 
enough to require the following condition to hold 


vz.Q(z)> V RAGE) V RÆ) (13) 


T=(LLR)ET r=(£,0,R)ET 


Intuitively, it states that from any state in Qe we can make progress, either by 
making a loop iteration or exiting the loop. Assuming that (Z, ø) is reachable in 
S and that o = Qy, the specialized loop £ will make at least ||p¢(a())|| iterations 
in any execution that starts in (¢,0). This also means that the original loop can 
make at least ||p¢(o(Z))|| iterations in any execution that starts in (¢,¢). 


Example 8. In Example 6, we have seen that if Qe, = x < y^x > 0 wasa 
quasi-invariant, then function pz, (x,y) = x + y becomes metering. We can make 
QO», a quasi-invariant by specializing the guards of the loop in transitions 7, and 
T4 to force the following execution with x + y iterations: first use 7, until z = 0 
(x iterations) and then use 74 until y = 0 (y iterations). This behavior can be 
forced by taking Gn = x > 0 and G,, = x < 0. With G,, we assure that x 
stops decreasing when x = 0, and with G,, we assure that 74 is used only when 
x =0. Now, Qa, =4<yAa>Oand pz, (x,y) = x+y are valid quasi-invariant 
and metering, resp. Function pe, decreases by exactly 1 in 7; and 74, is trivially 
non-positive in 72 because that transition is indeed disabled (x > 0 from Q,, 
and x < 0 from the guard) and is non-positive in T3 (x < yAy < 0 —> x+y <0). 
Regarding Qe, it verifies (9,10), and more importantly, the loop in 44 is non- 
blocking w.r.t Q¢,, Gn, and G,,, i.e., Condition (13) holds. 


3.4 Narrowing Non-deterministic Choices 


Loop transitions that involve non-deterministic variables, might give rise to exe- 
cutions of different lengths when starting from the same input values. Since we 
are interested in LB”, we are clearly searching for longer executions. However, 
since our approach is based on inferring LB, we have to take all executions into 
account which might result in less precise, or even trivial, LB”. 


Example 9. Consider a modification of the loop in Example 6 in which the vari- 
able x in 7, is decreased by a non-deterministic positive quantity u: 


Tn = (h, h, >0Ay>0Aae =2-uAudlaAy =y) 


The effect of this non-deterministic variable u is that 7, can be applied x times 
if we always take u = 1, [5] times if we always take u = 2 or even only once if 
we take u > x. As a consequence, pe, (x,y) = x+y is no longer a valid metering 
function because x can decrease by more than 1 in 7;. Moreover, Qs, = x < 
y \ x > 0 is not a quasi-invariant anymore since x’ = x—u/A u > 1 does not 
entail x’ > 0. In fact, no metering function involving x will be valid in 7; because 
x can decrease by any positive amount. 


To handle this complex situation, we propose narrowing the space of non- 
deterministic choices, and thus metering functions should be valid wrt. fewer 
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executions and more likely be found and be more precise. Next we state the 
requirements that such narrowing should satisfy. The choice of a narrowing that 
leads to longer executions is discussed in Sect. 4. 

A non-deterministic variables narrowing for a loop transition T € T is a 
formula U, (z, ū), over variables z and wt, that is added to T to restrict the choices 
for variables u. A specialized loop is now obtained by adding both G, and U, 
to the corresponding transitions. Suppose that for loop @ € £, in addition to 
G,, we are also given U, for each of its loop transitions 7. For Qg and pe to 
be quasi-invariant and metering function for the specialized loop £, we require 
conditions (9)-(13) to hold but after adding U, to the left-hand side of the 
implications in (9) and (11). Besides, unlike narrowing of guards, narrowing of 
non-deterministic choices might make a transition invalid, i.e., not satisfying 
Condition (1), and thus ||p¢(Z)|| cannot be used as a lower-bound on the number 
of iterations. To guarantee that specialized transitions are valid we require, in 
addition, the following condition to hold 


Vtdu. Qc(Z) A R(T) AG-(Z) > R(%, tu) AU-(Z,u) for each (0,0,R) ET (14) 


This condition is basically (1) taking into account the inequalities introduced by 
the corresponding narrowings. Assuming that (¢,0) is reachable in S and that 
oa | Qg, the specialized loop £ will make at least ||p¢(o(Z))|| iterations in any 
execution that starts in (£, øo), which also means, as before, that the original loop 
can make at least ||p¢(o(Z))|| iterations in any execution that starts in (¢,¢). 


Example 10. To solve the problems shown in Example 9 we need to narrow 
the non-deterministic variable u to take bounded values that reflect the worst- 
case execution of the loop. Concretely, we need to take Un = u < 1, which 
combined with u > 1 entails u = 1 so x decreases by exactly 1 in 71. Consider- 
ing the narrowing U/,,, the resulting loop is equivalent to the one presented in 
Example 8 so we could obtain the precise metering function pg, (£, y) = £ +y 
with the quasi-invariant Qa, = x < yAw > 0. Note that (14) holds for 
Tı because u = 1 makes the consequent true for every value of x and y: 
Vtdt. (xxr <yAz>0)A(r>0^Ay>0)Az>0—>u>l^u<1 


3.5 Ensuring the Feasibility of the Specialized Loops 


In order to enable the propagation of the local lower-bounds back to the input 
location (as we have discussed at the beginning of Sect. 3), we have to ensure that 
there is actually an execution that starts in ọ and passes through the specialized 
loop. In other words, we have to guarantee that when putting all specialized loops 
together, they still form a non-blocking TS for some set of input values. We 
achieve this by requiring that the quasi-invariants of the preceding loops ensure 
that the considered quasi-invariant for this loop also holds on initialization (i.e., 
it is an invariant for the considered context). Technically, we require, in addition 
o (9)-(14), the following conditions to hold for each loop £: 


Yz, u, 2’. Op (Z) AR > Q(z’) for each (0',0,R) €T (15) 
va Qy (16) 
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Condition (15) means that transitions entering loop £, strengthened with the 
quasi-invariant of the preceding location ¢’, must lead to states within the quasi- 
invariant Qy. Condition (16) guarantees that Q,, defines valid input values, i.e., 
within the initial condition O. 


Theorem 1 (soundness). Given Qe for each non-exit location € € £L, nar- 
rowings G- and U, for each loop transition T E€ T, and function pe for each loop 
location £, such that (9)-(16) are satisfied, it holds: 


1. The TS S' obtained from S by adding G- and U, to the corresponding tran- 
sitions, and changing the initial condition to Qe., is non-blocking. 

2. For any complete trace t of S’, if C = (€,0) is a configuration in t, then t 
includes at least ||pe(o(Z))|| visits to £ after C (i.e., ||pe(Z)|| is a lower-bound 
function on the number of iterations of the loop defined by location £). 


The proof of this soundness result is straightforward: it follows as a sequence of 
facts using the definitions of the conditions (9)-(16) given in this section. 

We note that when there is an unbounded overlap between the guards of the 
loop transitions and the guards of exit transitions, it is likely that a non-trivial 
metering function does not exist because it must be non-positive on the over- 
lapping states. To overcome this limitation, instead of using the exit transitions 
in (12), we can use ones that correspond to the negation of the guards of loop 
transitions, and thus it is ensured that they do not overlap. However, we should 
require (13) to hold for the original exit transitions as well in order to ensure 
that the non-blocking property holds. Another way to overcome this limitation 
is to simply strengthen the exit transitions by the negation of the guards. 

As a final comment, we note that it is not needed to assume that the TS S 
that we start with is non-blocking (even though we have done so in Sect. 2.1 
for clarity). This is because our formalization above finds a subset of S (S’ in 
Theorem 1) that is non-blocking, which is enough to ensure the feasibility of the 
local lower-bounds. This is useful not only for enlarging the set of TSs that we 
accept as input, but also allows us to start the analysis from any subset of S 
that includes a path from £o to the exit location. For example, it can be used to 
remove trivial execution paths from S, or concentrate on ones that include more 
sequences of loops (since we are interested in LB”). 


3.6 Handling General TSs 


So far we have considered a special case of TSs in which all locations, except 
the entry and exit ones, are multi-path loops. Next we explain how to handle 
the general case. It is easy to see that we can allow locations that correspond to 
trivial SCCs. These correspond to paths that connect loops and might include 
branching as well. For such locations, there is no need to infer metering functions 
or apply any specialization, we only need to assign them quasi-invariants that 
satisfy (15) to guarantee that the overall specialized TS is non-blocking. 

The more elaborated case is when the TS includes non-trivial SCCs that do 
not form a multi-path loop. In such case, if a SCC has a single cut-point, we 
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can unfold its edges and transform it into a multi-path following the techniques 
of [1]. It is important to note that when merging two transitions, the cost of the 
new one is the sum of their costs. In this case the number of iterations is still 
a lower-bound on the cost of the loop, however, we might get a better one by 
multiplying it by the minimal cost of its transitions. 

If a SCC cannot be transformed into a multi-path loop by unfolding its 
transitions, then it might correspond to a nested loop, and, in such case, we 
can recover the nesting structure and consider them as separated TSs that are 
“called” from the outer one using loop extraction techniques [25]. Each inner-loop 
is then analyzed separately, and replaced (in the original TS, where is “called” ) 
by a single edge with its lower-bound as cost for that edge, and then the outer is 
analyzed taking that cost into account. Besides, to guarantee that the specialized 
program corresponds to a valid execution, we require the quasi-invariant of the 
inner loop to hold in the context of the quasi-invariant of the outer loop. This 
approach is rather standard in cost analysis of structured programs [1,3, 12]. 

Another issue is how to compose the (local) lower-bounds of the specialized 
loops into a global-lower bound. For this, we can rely on the techniques [1,3] 
that rewrite the local lower-bounds in terms of the input values by relying on 
invariant generation and recurrence relations solving. 


4 Inference Using Max-SMT 


This section presents how metering functions and narrowings can be inferred 
automatically using Max-SMT, namely how to automatically infer all G,, Ur, 
Qp, and pe such that (9)-(16) are satisfied. We do it in a modular way, i.e., we 
seek G,, U+, Qe, and pe for one loop at a time following a (reversed) topological 
order of the SCCs, as we describe next. Recall that (16) is required only for loops 
connected directly to 9, and w.l.o.g. we assume there is only one such loop. 


4.1 A Template-Based Verification Approach 


We first show how the template-based approach of [6,17] can be used to find G+, 
U, and Q; by representing them as template constraint systems, i.e., each is a 
conjunction of linear constraints where coefficients and constants are unknowns. 
Also, pg is represented as a linear template function @- Z + a9 where (ao, @) are 
unknowns. Then, the problem is to find concrete values for the unknowns such 
that all formulas generated by (9)-(16) are satisfied: 


— Each V-formula generated by (9)-(16), except those of (14) that we handle 
below, can be viewed as an SV problem where the J is over the unknowns of the 
templates and the V is over (some of) the program variables. It is well-known 
that solving such an SV problem, i.e., finding values for the unknowns, can be 
done by translating it into a corresponding 3 problem over the existentially 
quantified variables (i.e., the unknowns) using Farkas’ lemma [20], which can 
then be solved using an off-the-shelf SMT solver. 
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— To handle (14) we follow [17], and eliminate 4u using the skolemization 
ui = a- T + ao where (ao,@) are fresh unknowns (different for each w;). 
This allows handling it using Farkas’ lemma as well. However, in addition, 
when solving the corresponding 3 problem we require all (ap, a) to be integer. 
This is because the domain of program variables is the integers, and picking 
integer values for all (ao, &) guarantees that the values of any x; that depends 
on ŭ will be integer as well!. 


The size of templates for G+, U+, and Qp, i.e., the number of inequalities, is 
crucial for precision and performance. The larger the size is, the more likely 
that we get a solution if one exists, but also the worse the performance is (as 
the corresponding SMT problem will include more constraints and variables). In 
practice, one typically starts with templates of size 1, and iteratively increases 
it by 1 when failing to find values for the unknowns, until a solution is found or 
the bound on the size is reached. 

Alternatively, we can use the approach of [17] to construct G,, U;, and Qg 
incrementally. This starts with templates of size 1, but instead of requiring all (9)- 
(16) to hold, the conditions generated by (12) are marked as soft constraints 
(i.e., we accept solutions in which they do not hold) and use Max-SMT to get 
a solution that satisfies as many of such soft conditions as possible. If all are 
satisfied, we are done, if not, we use the current solution to instantiate the 
templates, and then add another template inequality to each of them and repeat 
the process again. This means that at any given moment, each template will 
include at most one inequality with unknowns. Finally, to guarantee progress 
from one iteration to another, soft conditions that hold at some iteration are 
required to hold at the next one, i.e., they become hard. 

The use of (12) as soft constraint is based on the observation [12] that when 
seeking a metering function, the problematic part is often to guarantee that 
it is negative on exit transitions, which is normally achieved by adding quasi- 
invariants that are incrementally inferred. By requiring (12) to be soft we handle 
more exit transitions as the quasi-invariant gets stronger until all are covered. 


4.2 Better Quality Solutions 


The precision can also be affected by the quality of the solution picked by the 
SMT solver for the corresponding 4 problem. Since there might be many meter- 
ing functions that satisfy (9)-(16), we are interested in narrowing the search 
space of the SMT solver in order to find more accurate ones, i.e., lead to longer 
executions. Next we present some techniques for this purpose. 


Enabling More Loop Transitions. We are interested in guard narrowings that 
keep as many loop transitions as possible, since such narrowings are more likely 


1 Because we assumed that constraints involving primed variables are of the form 
r; =G-£4+b-ute. 
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to generate longer executions. This can be done by requiring the following to 
hold 


za VV (Qe(@) AR(Z) AG,(2)) (17) 


T=(L2,R)ET 


WwW 
8 


We also use Max-SMT to require a solution that satisfies as many disjuncts as 
possible and thus eliminating less loop transitions (if Q(z) A R(Z) A G;(Z) is 
false for a transition 7, then it is actually disabled). Note that this condition 
can be used instead of (10) that requires the quasi-invariant to be non-empty. 


Larger Metering Functions. We are interested in metering functions that lead 
to longer executions. One way to achieve this is to require metering functions to 
be ranking as well, i.e., in addition to (11) we require the following to hold 


VZ, u, B.Q0(Z)AGr(B)AUz(E, B)AR— pe(Z)—pe(Z’) > 1 for each (¢,0,R) ET (18) 
Yz, u.Qe (7) A Gr(Z) A R(T) > e(z) > 0 for each (£,0,R) ET (19) 


These new conditions are added as soft constraints, and we use Max-SMT to 
ask for a solution that satisfies as many conditions as possible. 


Unbounded Metric Functions. We are interested in metering functions that do 
not have an upper bound, since otherwise they will lead to constant lower-bound 
functions. For example, for a loop with a transition x > 0 Aa’ = x — 1, we want 
to avoid quasi-invariants like x < 5 which would make the metering function x 
bounded by 5. For this, we rely on the following lemma. 


Lemma 1. A function p(Z) =a- T + ao is unbounded over a polyhedron P, iff 
a- y is positive on at least one ray y of the recession cone of P. 


It is known that for a polyhedron P given in constraints representation, its 
recession cone cone(P) is the set specified by the constraints of P after removing 
all free constants. Now we can use the above lemma to require that the metering 
function pe(Z) = @- ¥ + G is unbounded in the quasi-invariant Qg by requiring 
the following condition to hold 


Jz. cone(Q;) \G- E> 0 (20) 


where cone(Q,) is obtained from the template of Q; by removing all (unknowns 
corresponding to) free constants, i.e., it is the recession cone of Qe. 

Note that all encodings discussed in this section generate non-linear SMT 
problems, because they either correspond to AV problems that include templates 
on the left-hand side of implications, or to 3 problems over templates that include 
both program variables and unknowns. 

Finally, it is important to note that the optimizations described provide the- 
oretical guarantees to get better lower bounds: the one that adds (18,19) leads to 
a bound that corresponds exactly to the worst-case execution (of the specialized 
program), and the one that uses (20) is essential to avoid constant bounds. 
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5 Implementation and Experimental Evaluation 


We have implemented a LOwer-Bound synthesizER, named LOBER, that can 
be used from an online web interface at http://costa.fdi.ucm.es/lober. LOBER is 
built as a pipeline with the following processes: (1) it first reads a KoAT file [5] 
and generates a corresponding set of multi-path loops, by extracting parts of the 
TS that correspond to loops [25], applying unfolding, and inferring loop sum- 
maries to be used in the calling context of nested loops, as explained in Sect. 3.6; 
(2) it then encodes in SMT the conditions (9)—(13) defined through the paper, for 
each loop separately, by using template generation, a process that involves sev- 
eral non-trivial implementations using Farkas’ lemma (this part is implemented 
in Java and uses Z3 [8] for simple (linear) satisfiability checks when produc- 
ing the Max-SMT encoding); (3) the problem is solved using the SMT solver 
Barcelogic [4], as it allows us to use non-linear arithmetic and Max-SMT capa- 
bilities in order to assert soft conditions and implement the solutions described 
in Sect. 4; (4) in order to guarantee the correctness of our system results, we 
have added to the pipeline an additional checker that proves that the obtained 
metering function and quasi-invariants verify conditions (9)—(13) by using Z3. To 
empirically evaluate the results of our approach, we have used benchmarks from 
the Termination Problem Data Base (TPDB), namely those from the category 
Complexity_ITS that contains Integer Transition Systems. We have removed 
non-terminating TSs and terminating TSs whose cost is unbounded (i.e., the 
cost depends on some non-deterministic variables and can be arbitrarily high) 
or non-linear, because they are outside the scope of our approach. In total, we 
have considered a set of 473 multi-path loops from which we have excluded 13 
that were non-linear. Analyzing these 473 programs took 199 min, an average of 
25 sec by program, approximately. For 255 of them, it took less than 1 s. 

Table 1 illustrates our results and compares them to those obtained by the 
LoAT [12,13] system, which also outputs a pair (p, Q) of a lower-bound function 
p and initial conditions Q on the input for which p is a valid lower-bound. 
In order to automatically compare the results obtained by the two systems, 
we have implemented a comparator that first expresses costs as functions f : 
N — R>o over a single variable n and then checks which function is greater. To 
obtain this unary cost function from the results (p, Q), we use convex polyhedra 
manipulation libraries to maximize the obtained cost p wrt. OA =n < a; <n, 
where x; are the TS variables, and express that maximized expressions in terms of 
n. Therefore, f(n) represents the maximum cost when the variables are bounded 
by |a;| < n and satisfy the corresponding initial condition Q, a notion very 
similar to the runtime complexity used in [12,13]. Once we have both unary 
linear costs fi(n) = kin + dı and fo(n) = kən + do, we compare them in n > 0 
by inspecting kı and kə. 

Each row of the table contains the number of loops for which both tools 
obtain the same result (=), the number of loops where LOBER is better than 
LoAT (>) and the number of loops where LoAT is better than LOBER (<). The 
subcategories are obtained directly from the name of the innermost folder, except 
for the cases in which this folder contains too few examples that we merge them 
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Table 1. Results of the experiments. 


Benchmark set Total = > < | Benchmark set Total = > < 

BROCKSCHMIDT_16 FGPSF09/Misc A iG 8 a 
c-examples/ABC 33 33 0 0 KoAT-2013 10 10 0 0 
c-examples/SPEED 29 25 4 0 KoAT-2014 14 14 0 0 
c-examples/WTC 45 39 4 2 SAS10 46 40 1 5 
c-examples/Misc 9 9 0 0 | FLORES-MontToyaA_16 176 158 16 2 
costa 6 5 1 0| HARK-20 
FGPSF09/Beerendonk ms; QL al 0) Ben_Amram_Genaim MO 4 Al 
FGPSF09/patrs Iker ig 2 @ Nils_2019 16 16 0 0 


all in a Misc folder in the parent directory. The total number of loops that are 
considered in each subcategory appears in column Total. BROCKSCHMIDT_16 
and HARK-20 have their first row empty as all their results are contained in their 
subcategories. Globally, both tools behave the same in 412 programs (column 
“=” ), obtaining equivalent linear lower bounds in 376 of them and a constant 
lower bound in the remaining ones. Our tool LOBER achieves a better accuracy in 
37 programs (column “>”), while LoAT is more precise in 11 programs (column 
“<”). Let us discuss the two sets of programs in which both tools differ. As 
regards the 37 examples for which we get better results, we have that LoAT 
crashes in 4 cases and it can only find a constant lower bound in 1 example 
while our tool is able to find a path of linear length by introducing the necessary 
quasi-invariants. For the remaining 32 loops, both tools get a linear bound, 
but LOBER finds one that leads to an unboundedly longer execution: 18 of 
these loops correspond to cases that have implicit relations between the different 
execution paths (like our running examples) and require semantic reasoning; for 
the remaining 14, we get a better set of quasi-invariants. The following techniques 
have been needed to get such results in these 37 better cases (note that (i) is 
not mutually exclusive with the others): 


(i) 1 needs narrowing non-deterministic choices, 

(ii) 5 do not need quasi-invariants nor guard narrowing, 

(iii) 14 need quasi-invariants only, 

(iv) 18 need both quasi-invariants and guard narrowing (in 3 of them this is 
only used to disable transitions). 


Therefore, this shows experimentally the relevance of all components within our 
framework and its practical applicability thanks to the good performance of the 
Max-SMT solver on non-linear arithmetic problems. In general, for all the set of 
programs, we can solve 308 examples without quasi-invariants and 444 without 
guard-narrowing. The intersection of these two sets is: 298 examples (63% of the 
programs), that leaves 175 programs that need the use of some of the proposed 
techniques to be solved. 

As regards the 11 examples for which we get worse results than LoAT, we 
have two situations: (1) In 6 cases, the SMT-solver is not able to find a solution. 
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We noticed that too many quasi-invariants were required, what made the SMT 
problem too hard. To improve our results, we could start, as a preprocessing step, 
from a quasi-invariant that includes all invariant inequalities that syntactically 
appear in the loop transitions, something similar to what is done by LoAT when 
inferring what they call conditional metering function [12]. This is left for future 
experimentation. (2) In the other 5 cases, our tool finds a linear bound but with a 
worse set of quasi-invariants, which makes the LoAT bound provide unboundedly 
longer executions. We are investigating whether this can be improved by adding 
new soft constraints that guide the solver to find these better solutions. Finally, 
let us mention that, for the 13 problems that LoAT gives a non-linear bound 
and have been excluded from our benchmarks as justified above, we get a linear 
bound for the 12 that have a polynomial bound (of degree 2 or more), and a 
constant bound for the additional one that has a logarithmic lower bound. This 
is the best we can obtain as our approach focuses on the inference of precise 
local linear bounds, as they constitute the most common type of loops. 

All in all, we argue that our experimental results are promising: we triple 
LoAT in the number of benchmarks for which we get more accurate results 
and, besides, many of those examples correspond to complex loops that lead to 
worse results when disconnecting transitions. Besides, we see room for further 
improvement, as most examples for which LoAT outperforms us could be handled 
as accurately as them with better quasi-invariants (that is somehow a black-box 
component in our framework). Syntactic strategies that use invariant inequalities 
that appear in the transitions, like those used in LoAT, would help, as well as 
further improvements in SMT non-linear arithmetic. 


Application Domains. The accuracy gains obtained by LOBER have applications 
in several domains in which knowing the precise cost can be fundamental. This is 
the case for predicting the gas usage [26] of executing smart contracts, where gas 
cost amounts to monetary fees. The caller of a transaction needs to include a gas 
limit to run it. Giving a too low gas limit can end in an “out of gas” exception 
and giving a too high gas limit can end in a “not enough eth (money)” error. 
Therefore having a tighter prediction is needed to be safe on both sides. Also, 
when the UB is equal to the LB, we have an exact estimation, e.g., we would know 
precisely the runtime or memory consumption of the most costly executions. This 
can be crucial in safety-critical applications and has been used as well to detect 
potential vulnerabilities such as denial-of-service attacks. In https://apps.dtic. 
mil/sti/pdfs/AD1097796.pdf, vulnerabilities are detected in situations in which 
both bounds do not coincide. For instance, in password verification programs, if 
the UB and LB differ due to a difference on the delays associated to how many 
characters are right in the guessed password, this is identified as a potential 
attack. 


6 Related Work and Conclusions 


We have proposed a novel approach to synthesize precise lower-bounds from 
integer non-deterministic programs. The main novelties are on the use of loop 
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specialization to facilitate the task of finding a (precise) metering function and 
on the Max-SMT encoding to find larger (better) solutions. Our work is related 
to two lines of research: (1) non-termination analysis and (2) LB inference. 
In both kinds of analysis, one aims at finding classes of inputs for which the 
program features a non-terminating behavior (1) or a cost-expensive behavior 
(2). Therefore, techniques developed for non-termination might provide a good 
basis for developing a LB analysis. In this sense, our work exploits ideas from 
the Max-SMT approach to non-termination in [17]. The main idea borrowed 
from [17] has been the use of quasi-invariants to specialize loops towards the 
desired behavior: in our case towards the search of a metering function, while 
in theirs towards the search of a non-termination proof. However, there are 
fundamental differences since we have proposed other new forms of loop spe- 
cialization (see a more detailed comparison in Sect. 1) and have been able to 
adapt the use of Max-SMT to accurately solve our problem (i.e., find larger 
bounds). As mentioned in Sect. 1, our loop specialization technique can be 
used to gain precision in non-termination analysis [17]. For instance, in this 
loop: “while (x>=0 and y>=0) {if («) {x++; y-—;} else {x——;y++;}}” no sub 
SCC (considering only one of the transitions) is non-terminating and no quasi- 
invariant can be found to ensure we will stay in the loop (when considering both 
transitions), hence cannot be handled by [17]. Instead if we narrow the transi- 
tions by adding y >= in the if-condition (and hence x > y in the else), we can 
prove that x >= 0 Ay >= 0A^Az +y = 1 is quasi-invariant, which allow us to 
prove non-termination in the way of [17] (as we will stay in the loop forever). 
As regards LB inference, the current state-of-the-art is the work by Frohn et 
al. [12,13] that introduces the notion of metering function and acceleration. Our 
work indeed tries to recover the semantic loss in [12,13] due to defining metering 
functions for simple loops and combining them in a later stage using accelera- 
tion. Technically, we only share with this work the basic definition of metering 
function in Sect. 3.1. Indeed, the definition in conditions (3) and (4) already 
generalizes the one in [12,13] since it is not restricted to simple loops. This 
definition is improved in the following sections with several loop specializations. 
While [12,13] relies on pure SMT to solve the problem, we propose to gain preci- 
sion using Max-SMT. We believe that similar ideas could be adapted by [12,13]. 
Due to the different technical approaches underlying both frameworks, their 
accuracy and efficiency must be compared experimentally wrt. the LoAT system 
that implements the ideas in [12,13]. We argue that the results in Sect. 5 justify 
the important gains of using our new framework and prove experimentally that, 
the fact that we do not lose semantic relations in the search of metering func- 
tions is key to infer LB for challenging cases in which [12,13] fails. Originally, 
the LoAT [12,13] system only accelerated simple loops by using metering func- 
tions, so the overall precision of the lower bound relied on obtaining valid and 
precise metering functions. However, the framework in [12,13] is independent of 
the accelerating technique applied. In order to increase the number of simple 
loops that can be accelerated, Frohn [11] proposes a calculus to combine differ- 
ent conditional acceleration techniques (monotonic increase/decrease, eventual 
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increase/decrease, and metering functions). These conditional acceleration tech- 
niques assume that all the iterations of the loop verify some condition y, and 
the calculus applies the techniques in order and extract those conditions y from 
fragments of the loop guard. Although more precise and powerful, the combined 
acceleration calculus considers only simple loops, so it does not solve the preci- 
sion loss when the loop cost involves several interleaved transitions. Moreover, 
the techniques in [11] are integrated into LoAT, so the experimental evaluation 
in Sect. 5 compares our approach to the framework in [12,13] extended with 
several techniques to accelerate loops (not only metering functions). 

Finally, our approach presents similarities to the CTL* verification for ITS 
in [7] as both extend transition guards of the original ITS. The difference is 
that in [7] the added constraints only contain newly created prophecy vari- 
ables and the transitions to modify are detected directly using graph algorithms; 
whereas our SMT-based approach adds constraints only over existing variables 
to satisfy the properties that characterize a good metering function. Addition- 
ally, both approaches differ both in the goal (CTL* verification vs. inference of 
lower-bounds) and the technologies applied (CTL model checkers vs. Max-SMT 
solvers). 
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Abstract. We introduce new algorithms for computing non-termination 
sensitive control dependence (NTSCD) and decisive order dependence 
(DOD). These relations on vertices of a control flow graph have many 
applications including program slicing and compiler optimizations. Our 
algorithms are asymptotically faster than the current algorithms. We 
also show that the original algorithms for computing NTSCD and DOD 
may produce incorrect results. We implemented the new as well as fixed 
versions of the original algorithms for the computation of NTSCD and 
DOD. Experimental evaluation shows that our algorithms dramatically 
outperform the original ones. 


1 Introduction 


Control dependencies between program statements are studied since 70’s. 
They have important applications in compiler optimizations [12,14,16], pro- 
gram analysis [9,19,36], and program transformations, especially program slic- 
ing [1,9,22,26,37]. Slicing is used in many areas including testing, debugging, 
parallelization, reverse engineering, program analysis and verification [17,28]. 

Informally, two statements in a program are control dependent if one directly 
controls the execution of the other in some way. This is typically the case for 
if statements and their bodies. Control dependencies are nowadays classified as 
weak (non-termination insensitive) if they assume that a given program always 
terminates, or as strong (non-termination sensitive) if they do not have this 
assumption [13]. We illustrate the difference on the control flow graph in Fig. 
1. Node a controls whether b or c (and then d) is going to be executed, so b, c, 
and d are control dependent on a (the convention is to display dependence as 
edges in the “controls” direction). Similarly, b controls the execution of c and d, 
as these nodes may be bypassed by going from b to e. Note also that d controls 
whether d is going to be executed in the future and thus is control dependent on 
itself. However, c does not control d as any path from c hits d. All dependencies 
mentioned so far are weak, namely standard control dependencies as defined by 
Ferrante et al. [16]. Weak control dependence assumes that the program always 
terminates, in particular, that the loop over d cannot iterate forever. As a result, 
© The Author(s) 2021 
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Fig. 1. An example of a control flow graph and control dependencies (red edges). 
The dotted dependencies are additional non-termination sensitive control dependencies. 
(Color figure online) 


e is reached by all executions and thus it is not weakly control dependent on any 
node. However, e is strongly control dependent on b and d. Indeed, if we assume 
that some executions can loop over d forever, then reaching e is controlled clearly 
by d and also by b as it can send the execution directly to e. 

This paper is concerned with the computation of two prominent strong 
control dependencies introduced by Ranganath et al. [32,33], namely non- 
termination sensitive control dependence (NTSCD) and decisive order depen- 
dence (DOD). NTSCD is studied in Sect. 3, which follows after preliminaries in 
Sect. 2. We first recall the definition of NTSCD and the algorithm of Ranganath 
et al. [33] for its computation. Then we show a flaw in the algorithm and suggest a 
fix. Finally, we introduce a new algorithm for the computation of NTSCD. Given 
a control flow graph with |V| nodes, the new algorithm runs in time O(|V|?), 
while the algorithm of Ranganath et al. runs in time O(|V|* - log |V|) and its 
fixed version in time O(|V|°). We show a NTSCD relation of size @(|V|?), which 
means that our algorithm is asymptotically optimal. 

The DOD relation captures the cases when one node controls the execution 
order of two other nodes. Roughly speaking, nodes {b, c} are DOD on a whenever 
all executions passing through a eventually reach both b and c and a controls 
which is reached first. Ranganath et al. [33] proved that the relation is empty 
for reducible graphs [21], i.e., graphs where every cycle has a single entry point. 
Control flow graphs of structured programs are reducible, but irreducible graphs 
may arise for example in the following situations [11,33,35]: 


— unstructured coding by a human, which is rather rare nowadays, 

— compilation into unstructured code representation like JVM bytecode, 

— tail call recursion optimization during compilation, 

— when the control flow graph is interprocedural — in this case, irreducibility 
may be introduced by recursion or exceptions handling, 

— by reversing a control flow graph containing, for example, break statements 

— when the control flow graph is not generated from program, but, e.g., from a 
finite state machine. 


The DOD relation is important (together with NTSCD) when we want to slice 
possibly non-terminating programs with irreducible control flow graphs and pre- 
serve their termination properties as well as data integrity [1,33]. This is a 
common requirement when slicing is used as a preprocessing step before pro- 
gram verification [9,23,26], worst-case execution time analysis [29], information 
flow analysis [18,19], analysis of concurrent programs [18] with busy-waiting 
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synchronization or synchronization where possible spurious wake-ups of threads 
are guarded by loops (e.g., programs using the pthread library), and analysis of 
reactive systems and generic state-based models [2, 24,33]. 

The DOD relation is studied in Sect. 4, where we recall its definition, discuss 
the Ranganath et al.’s algorithm for DOD [33], and show that this algorithm 
also contains a flaw. Fortunately, this flaw can be easily fixed without changing 
the complexity of the algorithm. Further, we develop a theory that underpins 
our new algorithm for the computation of DOD. Due to the space limitations, 
proofs of theorems can be found only in the extended version of this paper [8]. 
The new algorithm, presented at the end of the section, computes DOD in time 
O(|V|8), while the original as well as the fixed version of the Ranganath et al.’s 
algorithm runs in O(|V|° - log|V|). We show a DOD relation of size O(|V|°), 
which means that our algorithm is again asymptotically optimal. 

Section 5 focuses on control closures (CC) introduced by Danicic et al. [33], 
which generalize control dependence to arbitrary directed graphs. It is known 
that the strong (i.e., non-termination sensitive) control closure for a set of nodes 
containing the starting node is equivalent to the closure under NTSCD and DOD 
relations. Hence, our algorithms for NTSCD and DOD can be used to compute 
strong CC in time O(|V|*) on control flow graphs, while the original algorithm 
by Danicic et al. [13] runs in O(|V]*). 

Our theoretical contribution to computation of strong control dependencies 
is summarized in Table 1. Section 6 presents experimental evaluation showing 
that our algorithms are indeed dramatically faster than the original ones. The 
paper is concluded with Sect. 7. 


1.1 Related Work 


The first paper concerned with control dependence is due to Denning and Den- 
ning [15], who used control dependence to certify that flow of information in a 
program is secure. Weiser [37], Ottenstein and Ottenstein [30], and Ferrante et 
al. [16] used control dependence in program slicing, which is also the motivation 
for the most of the latter research in this area. These “classical” papers study 
control dependence in terminating programs with a unique exit node eventually 
reached by every execution. These restrictions have been gradually removed. 


Table 1. Overview of discussed algorithms and their complexities on CFGs 


Relation/closure | Algorithm Complexity 
NTSCD Original algorithm by Ranganath et al. [33] O(|V |4 - log |V|) 
Sect. 3 Fixed algorithm by Ranganath et al. [33] | O(|V|>) 
New algorithm O(|V |?) 
DOD Original algorithm by Ranganath et al. [33] |O(|V|° -log |V|) 
Sect. 4 Fixed algorithm by Ranganath et al. [33] | O(|V|° -log |V|) 
New algorithm O(|\V 9) 
Strong CC Original algorithm by Danicic et al. [13] O(|V|*) 
Sect. 5 New NTSCD-and-DOD-based algorithm O(\V 9) 
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Podgurski and Clarke [31] defined the first strong control dependence that 
does not assume termination of the program.’ However, their definitions and 
algorithms still require programs to have a unique exit node. 

Bilardi and Pingal [5] introduced a framework that uses generalized domi- 
nance relation on graphs. In their framework, they are able to compute Podgurski 
and Clarke’s control dependence in O(|E|+|V |?) time for a directed graph (V, E) 
with a unique exit node. In theory, NTSCD could be computed in their frame- 
work. However, computing augmented post-dominator tree — the central data 
structure of their framework — requires the unique exit node as it starts with 
post-dominator tree and, mainly, is much more complicated compared to our 
algorithm for NTSCD [5]. 

Chen and Rosu [10] introduced a parametric approach where loops can be 
annotated with information about termination. The resulting control depen- 
dence is somewhere between the classical and Podgurski and Clarke’s control 
dependence, the two being the extremes. 

The notion of NTSCD and DOD was founded in works of Ranganath et al. 
[32,33] in order to slice reactive systems, e.g., operating systems or controllers of 
embedded devices. They generalized also classical (non-termination insensitive) 
control dependence to graphs without the unique exit point (further investigated, 
e.g., by Androutsopoulos et al. [3]) and provided several relaxed versions of DOD. 

Danicic et al. [13] introduced weak and strong control closures (CC) that 
generalize weak and strong control dependence (thus also NTSCD) to arbitrary 
graphs. They provide algorithms for the computation of minimal closures that 
run in O(|V|) (weak CC) and O(|V|*) (strong CC) on graph with |V| nodes. 

An orthogonal study of control dependence that arises between statements 
in different procedures (e.g., due to calls to exit ()) was carried out by Loyall 
and Mathisen [27], Harrold et al. [20], and Sinha et al. [34]. 


2 Preliminaries 


A finite directed graph is a pair G = (V, E), where V is a finite set of nodes and 
E C V x V is a set of edges. If there is an edge (m,n) € E, then n is called 
a successor of m, m is a predecessor of n, and the edge is an outgoing edge of 
m. Given a node n, Successors(n) and Predecessors(n) denote the sets of all its 
successors and predecessors, respectively. A path from a node nı is a nonempty 
finite or infinite sequence nino... E€ Vt UV” of nodes such that there is an edge 
(ni; ni+1) E€ E for each pair ni, ni+ı of adjacent nodes in the sequence. A path 
is called maximal if it cannot be prolonged, i.e., it is infinite or the last node of 
the path has no outgoing edge. A node m is reachable from a node n if there 
exists a finite path such that its first node is n and its last node is m. 

We say that a graph is a cycle, if it is isomorphic to a graph (V, E) where 
V = {ni,...,nx} for some k > 0 and E = {(n1, n2), (n2,n3),..-,(Me-1, Nk); 
1 Podgurski and Clarke [31] called their control dependence weak control dependence 


as it is a superset of classical control dependence. Nowadays, we use the terms weak 
and strong precisely in the opposite meaning [13]. 
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(nz, nı)}. A cycle unfolding is a path in the cycle that contains each node pre- 
cisely once. 

In this paper, we consider programs represented by control flow graphs, where 
nodes correspond to program statements and edges model the flow of control 
between the statements. As control dependence reflects only the program struc- 
ture, our definition of a control flow graph does not contain any statements. 
Our definition also does not contain any start or exit nodes as these are not 
important for the problems we study in this paper. 


Definition 1 (Control flow graph, CFG). A control flow graph (CFG) is a 
finite directed graph G = (V, E) where each node v € V has at most two outgoing 
edges. Nodes with exactly two outgoing edges are called predicate nodes or simply 
predicates. The set of all predicates of a CFG G is denoted by Predicates(G). 


3 Non-termination Sensitive Control Dependence 


This section recalls the definition of NTSC by Ranganath et al. [32] and their 
algorithm for computing NTSCD. Then we show that the algorithm can produce 
incorrect results and introduce a new algorithm that is asymptotically faster. 


Definition 2 (Non-termination sensitive control dependence, NTSCD). 
Given a CFG G = (V,E), a node n € V is non-termination sensitive control 
NTSCD 


dependent (NTSCD) on a predicate node p € Predicates(G), written p “2° n, 
if p has two successors sı and sz such that 


— all maximal paths from sı contain n, and 
— there exists a maximal path from sz that does not contain n. 


3.1 Algorithm of Ranganath et al. [33] for NTSCD 


The algorithm is presented in Algorithm 1. Its central data structure is a two- 
dimensional array S where for each node n and for each predicate node p with 
successors r and s, S|n, p] always contains a subset of {tpr, tps}. Intuitively, tpr 
should be added to S[n,p] if n appears on all maximal paths from p that start 
with the prefix pr. The workbag holds the set of nodes n for which some S[n, p] 
value has been changed and this change should be propagated. The first part of 
the algorithm initializes the array S with the information that each successor 
r of a predicate node p is on all maximal paths from p starting with pr. The 
main part of the algorithm then spreads the information about the reachability 
on all maximal paths in the forward manner. Finally, the last part computes the 
NTSCD relation according to Definition 2 and with use of the information in S. 
The algorithm runs in time O(|E] - |V|’ -log |V|) [33] for a CFG G = (V, E). 
The log |V| factor comes from set operations. Since every node in CFG has at 
most 2 outgoing edges, we can simplify the complexity to O(|V|* - log |V|). 
Although the correctness of the algorithm has been proved [32, Theorem 7], 
Fig. 2 presents an example where the algorithm provides an incorrect answer. 
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Algorithm 1: The NTSCD algorithm by Ranganath et al. [33] 


Input: a CFG G = (V, E) 
Output: a potentially incorrect NTSCD relation stored in ntscd 


1 Set S[n,p] = for all n € V and p € Predicates(G) // Initialization 
2 workbag — 0 
3 for p € Predicates(G) do 
4 for r € Successors(p) do 
5 Sir, p] — {tpr} 
6 workbag — workbag U {r} 
7 
8 while workbag 4 Ú do // Computation of S 
9 n <— pop from workbag 
10 if Successors(n) = {s} for some s 4n then // One successor case 
11 for p € Predicates(G) do 
12 if S[n, p] ~ S[s,p] # Ø then 
13 Sle, p] — Sls, p] U SIn p] 
14 workbag — workbag U {s} 
15 if |Successors(n)| > 1 then // Multiple successors case 
16 for m € V do 
17 if |S[m, n]| = |Successors(n)| then 
18 for p € Predicates(G) \ {n} do 
19 if S[n, p] \ S[m, p] 40 then 
20 S{m, p] — S[m,p] U S[n, p] 
21 workbag — workbag U {m} 
22 
23 ntscd — Ú // Computation of NTSCD 
24 fornEV do 
25 for p € Predicates(G) do 
26 if 0 < |S[n, p]| < |Successors(p)| then 
27 ntscd — ntscd U {p S82 n} 


The first part of the algorithm initializes S as shown in the figure and sets 
workbag to {2,6,3,4}. Then any node from workbag can be popped and pro- 
cessed. Let us apply the policy used for queues: always pop the oldest element in 
workbag. Hence, we pop 2 and nothing happens as the condition on line 17 is not 
satisfied for any m. This also means that the symbol tı2 is not propagated any 
further. Next we pop 6, which has no effect as 6 has no successor. By processing 
3 and 4, t23 and t24 are propagated to S[5,2] and 5 is added to the workbag. 
Finally, we process 5 and set S[6,2] to {t23, t24}. The final content of S is pro- 
vided in the figure. Unfortunately, the information in $ is sound but incomplete. 
In other words, if tpr € S[n, p], then n is indeed on all maximal paths from p 
starting with pr, but the opposite implication does not hold. In particular, t12 is 
missing in S[5,1] and S[6, 1]. Consequently, the last part of the algorithm com- 
putes an incorrect NTSCD relation: it correctly identifies 1 1 2, 2 MTs, 3, 
and 2 S82, 4, but it also incorrectly produces 1 1, 6 and misses 1 “5° 5. 
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S after initialization 


S[2,1] = {t12} 

S[6,1] = {tis} 
S[3,2] = {t23} 
S[4,2] = {toa} 


final S when nodes are popped in order 


2,6,3,4,5 (oldest first) 3, 4, 2,5,6 (correct) 
S[2,1] = {tio} S[2,1] = {tio} 
S[6,1] = {tie} S[3,2] = {t23} 
S[3,2] = {t23} S[4,2] = {tea} 
S[5,2] = {t23, t24} S[5,1] = {tio} 
S[6, 2] = {t23, t24} S|6, 1] = {ti2, tie} 
S[6, 2] = {t23, t24} 


Fig. 2. An example that shows the incorrectness of the NTSCD algorithm by Ran- 
ganath et al. [33]. Solid red edges depict the dependencies computed by the algorithm 
when it always pops the oldest element in workbag. The crossed dependence is incorrect. 
The dotted dependence is missing in the result. 


A necessary condition to get the correct result is to process 2 only after 3,4 
are processed and S[5,6] = {t23, t24}. For example, one obtains the correct S 
(also shown in the figure) when the nodes are processed in the order 3, 4, 2, 5,6. 

The algorithm is clearly sensitive to the order of popping nodes from workbag. 
We are currently not sure whether for each CFG there exists an order that 
leads to the correct result. An easy way to fix the algorithm is to ignore the 
workbag and repeatedly execute the body of the while loop (lines 10-21) for all 
n € V until the array S reaches a fixpoint. However, this modification would 
slow down the algorithm substantially. Computing the fixpoint needs O(|V|*) 
iterations over the loop body (lines 10-21 excluding lines 14 and 21 handling the 
workbag) and one iteration of this loop body needs O(|V|?). Hence, the overall 
time complexity of the fixed version is O(|V|°). 


3.2 New Algorithm for NTSCD 


We have designed and implemented a new algorithm computing NTSCD. Our 
algorithm is correct, significantly simpler and asymptotically faster than the 
original algorithm of Ranganath et al. [33]. 

The new algorithm calls for each node n a procedure that identifies all 
NTSCD dependencies of n on predicate nodes. The procedure works in the fol- 
lowing steps. 


1. Color n red. 

2. Pick an uncolored node such that it has some successors and they all are red. 
Color the node red. Repeat this step until no such node exists. 

3. For each predicate node p that has a red successor and an uncolored one, 


output p EB n. 
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Algorithm 2: The new NTSCD algorithm 
Input: a CFG G = (V, E) 
Output: the NTSCD relation stored in ntscd 


1 Procedure VISIT(n) // Auxiliary procedure 
2 n.counter — n.counter — 1 
3 if n.counter =0 A n.color # red then 
4 n.color — red 
5 for m € Predecessors(n) do 
6 VISIT(m) 
7 
8 Procedure COMPUTE(n) // Coloring the graph red for a given n 
9 for m € V do 
10 m.color — uncolored 
11 m.counter — | Successors(m)| 
12 n.color — red 
13 for m € Predecessors(n) do 
14 VISIT(m) 
15 
16 ntscd — Ú // Computation of NTSCD 
17 forne€V do 
18 COMPUTE(n) 
19 for p € Predicates(G) do 
20 if p has a red successor and an uncolored successor then 
21 ntscd — ntscd U {p S82 n} 


Unlike the Ranganath et al.’s algorithm which works in a forward manner, our 
algorithm spreads the information about the reachability of n on all maximal 
paths in the backward direction starting from n. 

The algorithm is presented in Algorithm 2. The procedure COMPUTE(n) 
implements the first two steps mentioned above. In the second step, it does 
not search over all nodes to pick the next node to color. Instead, it maintains 
the count of uncolored successors for each node. Once the count drops to 0 for a 
node, the node is colored red and the search continues with predecessors of this 
node. The third step is implemented directly in the main loop of the algorithm. 

To prove that the algorithm is correct, we basically need to show that when 
COMPUTE(n) finishes, a node m is red iff all maximal paths from m contain n. 
We start with a simple observation. 


Lemma 1. After COMPUTE(n) finishes, a node m is red if and only ifm =n 
orm has a positive number of successors and all of them are red. 


Proof. For each node m, the counter is initialized to the number of its successors 
and it is decreased by calls to VISIT(m) each time a successor of m gets red. When 


Fast Computation of Strong Control Dependencies 895 


the counter drops to 0 (i.e., all successors of the node are red), the node is colored 
red. Therefore, if m is red, it got red either on line 12 and m = n or mn 
and m is red because all its successors got red (it must have a positive number 
of successors, otherwise the counter could not be 0 after its decrement). In the 
other direction, if m = n, it gets red on line 12. If it has a positive number of 
successors which all get red, the node is colored red by the argument above. 


Theorem 1. After COMPUTE(n) finishes, for each node m it holds that m is 
red if and only if all maximal paths from m contain n. 


Proof. (“<=”) We prove this implication by contraposition. Assume that m is 
an uncolored node. Lemma 1 implies that each uncolored node has an uncolored 
successor (if it has any). Hence, we can construct a maximal path from m con- 
taining only uncolored nodes simply by always going to an uncolored successor, 
either up to infinity or up to a node with no successors. This uncolored maximal 
path cannot contain n which is red. 

(“==>”) For the sake of contradiction, assume that there is a red node m and 
a maximal path from m that does not contain n. Lemma 1 implies that all nodes 
on this path are red. If the maximal path is finite, it has to end with a node 
without any successor. Lemma 1 says that such a node can be red if and only if 
it is n, which is a contradiction. If the maximal path is infinite, it must contain 
a cycle since the graph is finite. Let r be the node on this cycle that has been 
colored red as the first one. Let s be the successor of r on the cycle. Recall that 
r Æ n as the maximal path does not contain n. Hence, node r could be colored 
red only when all its successors including s were already red. This contradicts 
the fact that r was colored red as the first node on the cycle. 


To determine the complexity of our algorithm on a CFG (V, E), we first 
analyze the complexity of one run of COMPUTE(n). The lines 9-11 iterate over 
all nodes. The crucial observation is that the procedure VISIT is called at most 
once for each edge (m,m’) € E of the graph: to decrease the counter of m 
when m’ gets red. Hence, the procedure COMPUTE(n) runs in O(|V|+ |£]|). This 
procedure is called on line 18 for each node n. Finally, lines 20-21 are executed 
for each pair of node n and predicate node p. This gives us the overall complexity 
O((\V| + |E) -V| + |V|?) = O((\V| + |El) - |V|). Since in control flow graphs it 
holds | Æ| < 2|V|, the complexity can be simplified to O(|V|?). 

Note that our algorithm is asymptotically optimal as there are CFGs with 
NTSCD relations of size O(|V|?). For example, the CFG in Fig. 3 has |V| = 2k+1 
nodes and the corresponding NTSCD relation 


{nj MS mj |i j € {1,...,k}} U {ni ME niga |ie {1,...,4-1}} 


is of size k? + k — 1 € O(|\V|?). 
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Fig. 4. An example of an irreducible CFG. There are no NTSCD dependencies, but a 
and b are DOD on p. 


4 Decisive Order Dependence 


There are control dependencies not captured by NTSCD. For example, consider 
the CFG in Fig. 4. Nodes a and b are not NTSCD on p as they lie on all maximal 
paths from p. However, p controls which of a and b is executed first. Ranganath 
et al. [33] introduced the DOD relation to capture such dependencies. 


Definition 3 (Decisive order dependence, DOD). Let G = (V,E) be a 
CFG and p,a,b € V be three distinct nodes such that p is a predicate node 
with successors sı and 82. Nodes a,b are decisive order-dependent (DOD) on p, 
written p 22> {a,b}, if 


— all maximal paths from p contain both a and b, 
— all maximal paths from sı contain a before any occurrence of b, and 
— all maximal paths from s2 contain b before any occurrence of a. 


The importance of DOD for slicing of irreducible programs is discussed in 
the introduction. 


4.1 Algorithm of Ranganath et al. [33] for DOD 


Ranganath et al. provided an algorithm that computes the DOD relation for a 
given CFG G = (V, E) in time O(|V|* - |E| -log|V|) which amounts to O(|V|° - 
log |V|) on CFGs [33, Fig. 7]. The algorithm contains one unclear point. For each 
triple of nodes p,a,b € V such that p € Predicates(G) and a Æ b, the algorithm 
executes the following check and if it succeeds, then p 2> {a,b} is reported: 


REACHABLE(a, b, G) A REACHABLE(), a, G) A DEPENDENCE(p, a, b, G) (1) 
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Fig. 5. An example that shows the incorrectness of the DOD algorithm by Ranganath 
et al. [33] 


The procedure DEPENDENCE(p, a, b, G) returns true iff a is on all maximal paths 
from one successor of p before any occurrence of b and b is on all maximal 
paths from the other successor of p before any occurrence of a. The procedure 
REACHABLE is specified only by words [33, description of Fig. 7] as follows: 


REACHABLE(a, b, G) returns true if b is reachable from a in the graph G. 


Unfortunately, this algorithm can provide incorrect results. For example, con- 
sider the CFG in Fig. 5. Nodes p,a,b satisfy the formula (1): a appears on all 
maximal paths from one successor of p (namely a) before any occurrence of b, 
and b appears on all maximal paths from the other successor of p (which is b) 
before any occurrence of a. At the same time, a and b are reachable from each 
other. However, it is not true that p 2% {a,b}, because a and b do not lie on 
all maximal paths from p (the first condition of Definition 3 is violated). 

The algorithm can be fixed by modifying the procedure REACHABLE(a, b, G) 
to return true if b is on all maximal paths from a. The modified procedure can 
be implemented with use of the procedure COMPUTE(b) of Algorithm 2. As the 
procedure COMPUTE(b) runs in O(|V| + |E|), the modification does not increase 
the overall complexity of the algorithm. By comparing the fixed and the original 
version of REACHABLE(a, b, G), one can readily confirm that the original version 
produces supersets of DOD relations. 


4.2 New Algorithm for DOD: Crucial Observations 


As in the case of NT'SCD, we have designed a new algorithm for the computation 
of DOD, which is relatively simple and asymptotically faster than the DOD 
algorithm of Ranganath et al. [33]. 

Given a CFG, our algorithm first computes for each predicate p the set Vp of 
nodes that are on all maximal paths from p. The definition of DOD implies that 
only pairs of nodes in V, can be DOD on p. For every predicate p we build an 
auxiliary graph A, with nodes V, and from this graph we get all pairs of nodes 
that are DOD on p. The graph Ay, is defined as follows. 


Definition 4 (V“interval [13]). Given a CFG G = (V, E) and a subset V’ C 
V, a path nı...np such that k > 2, nı,ng E€ V', and YL < i< k: ni € V' is 
called a V“interval from nı to np in G. 


In other words, a V“interval is a finite path with at least one edge that has 
the first and the last node in V’ but no other node on the path is in V’. 
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Definition 5 (Graph A,*). Given a CFG G = (V, E), a predicate node p € 
Predicates(G) and the subset Vp C V of nodes that are on all maximal paths 
from p, the Ap = (Vp, Ep) is the graph where 


Ep = {(2, y) | there exists a Vp -interval from x to y in G}. 


In this subsection, we describe the connections between these graphs and 
DOD that underpin our algorithm. The proofs of the theorems can be found in 
the extended version of this paper [8]. 

Given a predicate p of a CFG G, the graph A, does not have to be a CFG as 
nodes in A, can have more than two successors. However, Ap preserves exactly 
all possible orders of the first occurrences of nodes in Vp on maximal paths in G 
starting from p. More precisely, for each maximal path from p in G, there exists 
a maximal path from p in A, with the same order of the first occurrences of all 
nodes in V,, and vice versa. Further, it turns out that there are no nodes DOD 
on p unless A, has the right shape. 


Definition 6 (Right shape of A, ). Given a CFG G, a predicate node p € 
Predicates(G) and the graph Ap = (Vp, Ep), we say that Ap has the right shape 
if it consists only of a cycle and the node p with at least two edges going to some 
nodes on the cycle (i.e., the nodes of Vp \ {p} can be labeled nı,..., npg such 
that Ep = { (n1, n2), (n2,N3),---,(Me—1, Nk), (Me, M1) } U {(p, ni) | i € I} for some 
IC {1,...,k} with |I| > 2). 


Figure 6 depicts an A, which has the right shape. In the following text, we 
work only with A, graphs in the right shape. 

Let sı and s be the two successors of p in G. Note that sı and sg may, but 
do not have to be in Ap. To compute the pairs of nodes that are DOD on p, 
we need to know all possible orders of the first occurrences of nodes in V, on 
the maximal paths in G starting in sı and sg. Hence, for each successor s; we 
compute the set S; of nodes that appear as the first node of Vp on some maximal 
path from s; in G. Formally, for i € {1,2}, we define 


Si = {ne Vp | there exists a path s;...n € (V \ V,)*.V, in G}. 


The nodes in Sı U S2 are exactly all the successors of p in Ap. Further, the 
maximal paths from the nodes of S; in A, reflect exactly all possible orders 
of the first occurrences of nodes in Vp on maximal paths in G starting in s;. 
If Sı and Sz are not disjoint, then there exist two maximal paths in G, one 
starting in sı and the other in sg, that differ only in prefixes of nodes outside 
Vp. The definition of DOD implies that there are no nodes DOD on p in this 
case. Therefore we assume that Sı and S» are disjoint. 

The nodes in S; divide the cycle of A, into s;-strips, which are parts of the 
cycle starting with a node from S; and ending before the next node of Sj. 


? Graph A, can be defined as the graph induced by Vp in terms of Danicic et al. [13]. 
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$1-strips (blue): 


OPOE Tinn 
O 7 


82-strips (red): 


O> wo ona 


N5N6N7NBN1 


Fig. 6. An example of A, in the right shape. Strips are computed for Sı = {n1, n7} 
(blue nodes) and S2 = {n2, ns} (red nodes). (Color figure online) 


Definition 7 (s;-strip). Leti € {1,2}. An s;-strip is a path n... m E Si.(Vp N 
Si)“ in Ap such that the successor of m in A, is a node in S;. 


An example of Ap with s;-strips is in Fig. 6. The s;-strips directly say which 
pairs of nodes of V, are in the same order on all maximal paths from s; in G. 
In particular, a node a is before any occurrence of node 6 on all maximal paths 
from a successor s of p in G if and only if there is an s-strip containing both a 
and b where a is before b. As a corollary, we get the following theorem: 


Theorem 2. Let p be a predicate with successors 81,82 such that Ap has the 
right shape and Sı N S2 = Ý. Then nodes a,b € V, are DOD on p if and only if 


— there exists an s1-strip in Ap that contains a before b and 
— there exists an s2-strip in Ap that contains b before a. 


Consider again the A, in Fig. 6. The theorem implies that nodes n1, ns are 
DOD on pas they appear in s1-strip nynen3n4N5N¢6 and in so-strip N5nNgn7NgN1 
in the opposite order. Nodes n1, ng are DOD on p for the same reason. 

With use of the previous theorem, we can find a regular language over Vp 
such that there exist nodes a,b DOD on p iff some unfolding of the cycle in A, 
is in the language. 


Theorem 3. Let p be a predicate with successors 81,82 such that A, has the 
right shape and Sı N S2 = 0. Further, let U = Vp N (S1 U S2). There are some 
nodes a,b DOD on p if and only if the cycle in Ap has an unfolding of the form 
S1.U* (So.U*)*.So.U*.(S1.U*)*. 


Finally, an unfolding of the mentioned form can be directly used for the 
computation of nodes that are DOD on p. 


Theorem 4. Let p be a predicate with successors 81,82 such that A, has the 
right shape and Sı N S2 = @. Further, let Ap have an unfolding of the form 
S1.U* .(S2.U*)".S2.U*.(S1.U*)* where U = Vp N (S1 U S2). Then there is exactly 
one path mı... mi E S1.U*.S2 and exactly one path o1 ...0; E S2.U*.S1 on the 
cycle. Moreover, p 2° {a,b} if and only if mı ...Mi—ı contains a and o1 .. . Oj—1 
contains b (or the other way round). 
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Algorithm 3: The algorithm computing V,, for all nodes n 
Input: a CFG G = (V, E) 
Output: V, = {m € V | mis on all maximal paths from n} for all n € V 


1 Procedure VISIT(n,r) // Auxiliary procedure 
2 n.counter — n.counter — 1 
3 if n.counter =0 A r g Vn then 
4 Va — Vn U {r} 
5 for m € Predecessors(n) do 
6 VISIT(m, r) 
7 
8 Procedure COMPUTE(n) // ‘Coloring the graph red’ for a given n 
9 for m € V do 
10 m.counter — |Successors(m)| 
11 Va — Vn U {n} 
12 for m € Predecessors(n) do 
13 VISIT(m, n) 
14 
15 Procedure COMPUTEVpS // Computation of sets V, for all nodes n 
16 for n € V do 
17 Vn — O 
18 for n € V do 
19 COMPUTE(n) 


4.3 New Algorithm for DOD: Pseudocode and Complexity 


Our DOD algorithm is shown in Algorithms 3 and 4. As nearly all applications 
of DOD need also NTSCD, we present the algorithm with a simple extension 
(gray lines with asterisks) that simultaneously computes NTSCD. 

The DOD algorithm starts at line 20 of Algorithm 4. The first step is to 
compute the sets Vp for all predicate nodes p of a given CFG G. The computation 
of predicate nodes can be found in Algorithm 3. It is a slightly modified version 
of Algorithm 2. Recall that the procedure COMPUTE(n) of Algorithm 2 marks red 
every node such that all maximal paths from the node contain n. The procedure 
COMPUTE(n) of Algorithm 3 does in principle the same, but instead of the red 
color it marks the nodes with the identifier of the node n. Every node m collects 
these marks in set V,,. After we run COMPUTE(n) for all the nodes n in the 
graph, each node m has in its set Vm precisely all nodes that are on all maximal 
paths from m. For the computation of DOD, only the sets V, for predicate nodes 
p are needed, but the extension computing NTSCD may use all these sets. 

When the sets V, are calculated, we compute DOD (and NTSCD) depen- 
dencies for each predicate node separately by procedures COMPUTEDOD(p) 
and COMPUTENTSCD(p). The procedure COMPUTEDOD(p) first constructs the 
graph A, with the use of BUILDA,(p). Nodes of the graph are these of V,. To 
compute edges, we trigger depth-first search in G from each n € Vp. If we find 
a node m € Vp, we add the edge (n,m) to the graph A, and stop the search on 


Algorithm 4: The new DOD algorithm which computes also 
NTSCD if the gray lines are included (COMPUTEYV,S is given in Algo- 
rithm 3) 

Input: a CFG G = (V, E) 

Output: the DOD relation stored in dod (and NTSCD stored in ntscd) 


*16 


*17 
*18 


this path. When the graph A, is constructed, we check whether it has the right 
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Procedure COMPUTEDOD(p) // Computation of DOD for predicate p 


Ap <— BUILDA,(p) // Get the graph A, 
if A, does not have the right shape then 
return @ 
S1, S2 <— COMPUTES} S2(p) // Get the sets S1, S2 
if S1 N S2 40 then 
return @ 
nnz... nt — UNFOLDCYCLE(Ap, S1) // Unfold the cycle of A, 
U — Vp (S1 U S2) 
if nina... ni Z (S1.U*)*.(S2.U*)*.(S1.U*)* then // Apply Thm. 3 
return @ 
Mı... Mi — EXTRACT(nin2... ne, 51.U*.S2) // Apply Thm. 4 
01... Oj — EXTRACT(ning... Nt, S2.U*.S1) 
return {p 2 {a,b} | a € {mi,...,mi—1},b € {o1,...,0;-1}} 


Procedure COMPUTENTSCD(p) // Computation of NTSCD for 
predicate p 


{s1, s2} — Successors(p) 


return {p “8°, n | n € (Vs, N Vaa) U (Vsa N Vs, )} 
COMPUTEVpS // Computation of DOD and NTSCD for all nodes 
dod — 0 
ntscd — Ú 


for p € Predicates(G) do 


dod — dod U COMPUTEDOD(p) 
ntscd — ntscd U COMPUTENTSCD(p) 


shape. If not, we return Í as there are no nodes DOD on p in this case. 


The next step is to compute the sets Sı and S2. Again, we apply a similar 
depth-first search as in the construction of A, described above. If the sets 51, S2 


are not disjoint, we return # as there are no nodes DOD on p. 


Then we unfold the cycle in A, from an arbitrary node in $,, compute the 
set U, and check whether the unfolding matches (.9;.U*)*.($2.U*)* .(.$1.U*)*. 
Note that any unfolding starting in Sı matches this language iff the cycle has 
an unfolding of the form $).U*.(S2.U*)*.S2.U*.(.S;.U*)* of Theorem 3. Hence, 


we return @) if the check fails. 
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Fig. 7. A CFG with |V| nodes that has the DOD relation of size @(|V|°). 


Finally, we extract the paths of the form $;.U*.Sg and Sj.U*.S; from the 
unfolding. Note that the last node of the latter path can be the first node of the 
unfolding. Finally, we compute the DOD dependencies according to Theorem 4. 

The procedure COMPUTENTSCD(p) used for the computation of NTSCD 
simply follows Definition 2: it makes dependent on p each node that is on all 
maximal paths from the successor s; but not on all maximal paths from the 
successor s2 or symmetrically for s2 and sı. 

As the correctness of our algorithm comes directly from the observations 
made in the previous subsection, it remains only to analyze its complexity. The 
procedure COMPUTEV,S consists of two cycles in sequence. The first cycle runs 
in O(|V |). The second cycle calls O(|V|)-times the procedure COMPUTE(n). This 
procedure is essentially identical to the procedure of the same name in Algo- 
rithm 2 and so is its time complexity, namely O(|V| + |E]|). Note that sets can 
be represented by bitvectors and therefore adding an element and checking the 
presence of an element in a set are constant-time. Overall, the procedure COM- 
PUTEV,S runs in O(|V|- (|V| + |E])), which is O(|V|?) for CFGs. 

Now we discuss the complexity of the procedure COMPUTEDOD(p). Creat- 
ing the graph A, requires calling depth-first search O(|V|) times, which yields 
O(|V|-|£]) in total. Computation of S1, S2 requires another two calls of depth- 
first search, which is in O(|£]|). When sets are represented as bitvectors, checking 
that Sı and Sə are disjoint is in O(|V |). Unfolding the cycle, matching the unfold- 
ing to the language (line 10), and the procedure EXTRACT run also in O(|V]). 
The construction of the DOD relation on line 14 is in O(|V|?). Altogether, COM- 
PUTEDOD(p) runs in O(|V|- |E| + |V|?) which simplifies to O(|V|?) for CFGs. 

COMPUTEDOD is called O(|V|) times, so the overall complexity of computing 
DOD for a CFG G = (V, E) is O(|V|?). If we compute also NTSCD, we make 
O(|V|) extra calls to COMPUTENTSCD(p), where one call takes O(|V]) time. 
Therefore, the asymptotic complexity of computing NTSCD with DOD does not 
change from computing DOD only. 

Our algorithm running in time O(|V|%) is asymptotically optimal as there 
exist graphs with DOD relations of size O(|V |$). For example, the CFG in Fig. 7 
has |V| = 4k + 1 nodes and the corresponding DOD relation 
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is of size k? + k? € O(|V|3). 


5 Comparison to Control Closures 


In 2011, Danicic et al. [13] introduced control closures (CC) that generalize con- 
trol dependence from CFGs to arbitrary graphs. In particular, strong control 
closure, which is sensitive to non-termination, generalizes strong control depen- 
dence including NTSCD and DOD. 


Definition 8 (Strongly control-closed set). Let G = (V, E) be a CFG and 
let U C V. The set U is strongly control-closed? in G if and only if for every 
node v E€ VNU that is reachable in G from a node in U, one of these holds: 


— there is no node in U reachable from v or 
— there exists a node u E€ U such that all maximal paths from v contain u and 
it is the first node from U on all these paths. 


In other words, whenever we leave a strongly control-closed set, we either 
cannot return back or we have to return back to the set in a certain node. 


Definition 9 (Strong control closure, strong CC). Let G = (V,E) be a 
CFG and V' C V. A strong control closure (strong CC) of V’ is a strongly 
control-closed set U D V' such that there is no strongly control-closed set U' 
satisfying U 2 U’ 2 V”. 


Danicic et al. present an algorithm for the computation of strong control 
closures running in O(|V|*) [13, Theorem 66]. In fact, the algorithm uses a 
procedure I’ that is very similar to our procedure COMPUTE(n) of Algorithm 2. 

We can also define the closure of a set of nodes under NTSCD and DOD. 


Definition 10 (NTSCD and DOD closure). Let G = (V,E) be a CFG. A 
NTSCD and DOD closure of a set V’ C V is the smallest set U D V’ satisfying 


(neU A pt82n) => pEU and (a,bEU A p 2% {a,b}) = > pe. 


Definition 10 directly provides an algorithm computing the NT'SCD and DOD 
closure of a given set V’ C V. Roughly speaking, if we represent the NTSCD 
relation with edges and the DOD relation with hyperedges in a directed hyper- 
graph with nodes V, the closure computation amounts to gathering backward 
reachable nodes from V’. 


3 We adjusted the definition to the fact that predicates in our CFGs always have two 
outgoing edges (i.e., they are complete in terms of Danicic et al. [13]). The original 
definition [13] works with CFGs where each predicate has at most two successors 
and considers also paths that may end in a predicate with less than two successors. 
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Danicic et al. [13, Lemmas 93 and 94] proved that for a CFG G = (V, E) with 
a distinguished start node from which all nodes in V are reachable and a subset 
U C V such that start € U, the set U is strongly control-closed iff it is closed 
under NTSCD and DOD. Hence, on graphs with such a start node, the strong 
CC of a set V’ containing the start node can be computed also by computing 
its NTSCD and DOD closure. Computation of the NTSCD and DOD closure 
runs in O(|V|%) as the backward reachability is dominated by the computation 
of NTSCD and DOD relations. 

A substantial difference between the algorithm for strong CC by Danicic et 
al. [13] and our algorithm is that we are able to compute DOD and NTSCD 
separately, whereas the former is not. Moreover, our algorithm for NTSCD and 
DOD closure is asymptotically faster. 


6 Experimental Evaluation 


We implemented our algorithms for the computation of NTSCD, DOD, and the 
NTSCD and DOD closure in C++ on top of the LLVM [25] infrastructure. The 
implementation is a part of the library for program analysis and slicing called 
DG [6], which is used for example in the verification and test generation tool 
Symbiotic [7]. We also implemented the original Ranganath et al.’s algorithms 
for NTSCD and DOD, the fixed versions of these algorithms from Subsects. 3.1 
and 4.1, and the algorithm for the computation of strong CC by Danicic et al. 

In the implementation of the strong CC algorithm by Danicic et al. [13], we 
use our procedure COMPUTE(n) of Algorithm 2 to implement the function I. 
This should have only a positive effect as this procedure is more efficient than 
iterating over all edges in a copy of the graph and removing them [13]. 

In our experiments, we use CFGs of functions (where nodes of the CFG 
represent basic blocks of the function) obtained in the following way. We took all 
benchmarks from the Competition on Software Verification (SV-COMP) 2020.* 
These benchmarks contain many artificial or generated code, but also a lot of 
real-life code, e.g., from the Linux project. Each source code file was compiled 
with CLANG into LLVM and preprocessed by the -lowerswitch pass to ensure 
that every basic block has at most two successors. Then we extracted individual 
functions and removed those with less than 100 basic blocks, as the computation 
of control dependence runs swiftly on small graphs. Because it is possible that 
one function is present in multiple benchmarks, the next step was to remove 
these duplicate functions. For every function, we computed the number of nodes 
and edges in its CFG, and performed DFS on the CFG to obtain the number 
of tree, forward, cross and back edges, and the depth of the DFS tree. If two or 
more functions shared the name and all the computed numbers, we kept only 
one such function. Note that this process may have removed also a function that 
was not a duplicate of some other, but only with a low probability. At the end, 
we were left with 2440 functions. The biggest function has 27851 basic blocks. 
Table 2 shows the distribution of the sizes of the generated CFGs. 


4 https: //github.com/sosy-lab/sv- benchmarks, tag svcomp20. 
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Table 2. The numbers of considered CFGs by their sizes. The size of a CFG is the 
number of its nodes, which is the number of basic blocks of the corresponding function. 


size number size number size number 
100 — 199 1713 500 — 599 35 900 — 999 3 
200 — 299 355 600 — 699 29 1000 — 1999 23 
300 — 399 159 700 — 799 8 2000 — 9999 22 


400 — 499 73 800 — 899 T > 10000 3 
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Original NTSCD algorithm (fixed) [s] 


0 20 40 60 80 100 0 20 40 60 80 100 
New NTSCD algorithm [s] New NTSCD algorithm [s] 


Fig. 8. Comparison of the running times of the new NTSCD algorithm and the incor- 
rect (left) and the fixed (right) versions of the original NTSCD algorithm. TO stands 
for timeout. 


The experiments were run on machines with AMD EPYC CPU with the 
frequency 3.1GHz. Each benchmark run was constrained to 1 core and 8GB 
of RAM. We used the tool Benchexec [4] to enforce resources isolation and to 
measure their usage. All presented times are CPU times. We set the timeout to 
100s for each algorithm run. 

In the following, original algorithms refers to the algorithms of Ranganath 
et al. (we distinguish between the incorrect and the fixed versions when needed) 
and new algorithms refers to the algorithms introduced in this paper. 


NTSCD Algorithms. In the first set of experiments, we compared the new 
algorithm for NTSCD against the incorrect and the fixed version of the original 
NTSCD algorithm. Although it seems that comparing to the incorrect version 
is meaningless, we did not want to compare only to the fixed version as the 
provided fix slows down the algorithm. 

The results are depicted in Fig. 8. On the left scatter plot, there is the 
comparison of the new algorithm to the incorrect original algorithm and on 
the right scatter plot we compare to the fixed original algorithm. As we can see, 
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Fig. 9. Comparison of the running times of the new and the (fixed) original DOD 
algorithm. We use the considered benchmarks (left) and random graphs with 500 nodes 
and the number of edges specified by the x-axis (right). 


the new algorithm outperforms the original algorithm significantly. The incorrect 
original algorithm produced a wrong NTSCD relation in 98.6 % of the considered 
benchmarks. The fixed version of the original algorithm returned precisely the 
same NTSCD relations as the new algorithm. We can also see that the scatter 
plot on the right contains more timeouts of the original algorithm. It supports 
the claim that the fix slows down the original algorithm. 


DOD Algorithms. We compared the new DOD algorithm to the fixed version 
of the original DOD algorithm. As the fix does not change the asymptotic com- 
plexity of the original algorithm, we do not compare the new algorithm with the 
incorrect version of the original algorithm. The results of the experiments are 
displayed in Fig. 9 (left). We can see that the new algorithm is again very fast. 
In fact, the results resemble the results of the pure NTSCD algorithm, which is 
basically the part of the DOD algorithm that computes Vp sets. It benefits from 
early checks that detect predicate nodes with no DOD dependencies. 

As mentioned in the introduction, DOD is empty for structured programs 
as their CFGs are reducible. We do not know precisely how many of the 2440 
considered functions have irreducible CFGs, but we know that 2373 of them use 
goto statements. DOD relations for 12 functions was non-empty, which means 
that CFGs of these functions are irreducible. Note that there may have been 
other irreducible CFGs with empty DOD relation. 

Additionally, we tested the DOD algorithms on randomly generated graphs, 
where we can expect that irreducible graphs emerge more often. Figure 9 (right) 
shows the results for graphs that have 500 nodes and 50, 100, 150, ... randomly 
distributed edges (such that every node has at most two successors). Each pre- 
sented running time is in fact an average of 10 measurements with different 
random graphs. We can see that the new algorithm is agnostic to the number of 
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Fig. 10. Comparison of the running times of the strong CC algorithm by Danicic et 
al. [13] and our algorithm for the NTSCD and DOD closure. 


edges. Its running time in this experiment ranges from 4.12 - 107? to 8.89 - 1078 
seconds. The original DOD algorithm does not scale well with the increasing 
number of edges. 


Strong CC Algorithm. We also compare the strong CC algorithm of Dani- 
cic et al. [13] against our NTSCD and DOD closure algorithm on sets of nodes 
containing a distinguished start node, where these two algorithms produce equiv- 
alent results. For these experiments, we need a starting set that is going to be 
closed. We decided to run these experiments on the considered functions that 
have at least two exit points. The starting set consists of the node representing 
the entry point and the node representing one of the exit points. The closure of 
this set contains all nodes that may influence getting to the other exit points. 
The results are shown on the scatter plot in Fig. 10. Our algorithm clearly out- 
performs the strong CC algorithm. 


7 Conclusion 


We studied algorithms for the computation of strong control dependence, 
namely non-termination sensitive control dependence (NTSCD) and decisive 
order dependence (DOD) by Ranganath et al. [33] and strong control closures 
(strong CC) by Danicic et al. [13] on control flow graphs where each branching 
statement has two successors. We have demonstrated flaws in the original algo- 
rithms for computation of NTSCD and DOD and we have suggested corrections. 
Moreover, we have introduced new algorithms for NTSCD, DOD, and strong CC 
that are asymptotically faster. All the mentioned algorithms have been imple- 
mented and our experiments confirm dramatically better performance of the new 
algorithms. 
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Abstract. We present a novel verification technique to prove properties 
of a class of array programs with a symbolic parameter N denoting the 
size of arrays. The technique relies on constructing two slightly different 
versions of the same program. It infers difference relations between the 
corresponding variables at key control points of the joint control-flow 
graph of the two program versions. The desired post-condition is then 
proved by inducting on the program parameter N, wherein the differ- 
ence invariants are crucially used in the inductive step. This contrasts 
with classical techniques that rely on finding potentially complex loop 
invaraints for each loop in the program. Our synergistic combination of 
inductive reasoning and finding simple difference invariants helps prove 
properties of programs that cannot be proved even by the winner of 
Arrays sub-category in SV-COMP 2021. We have implemented a proto- 
type tool called DIFFY to demonstrate these ideas. We present results 
comparing the performance of DIFFY with that of state-of-the-art tools. 


1 Introduction 


Software used in a wide range of applications use arrays to store and update 
data, often using loops to read and write arrays. Verifying correctness properties 
of such array programs is important, yet challenging. A variety of techniques 
have been proposed in the literature to address this problem, including inference 
of quantified loop invariants [20]. However, it is often difficult to automatically 
infer such invariants, especially when programs have loops that are sequentially 
composed and/or nested within each other, and have complex control flows. 
This has spurred recent interest in mathematical induction-based techniques for 
verifying parametric properties of array manipulating programs [11,12,42,44]. 
While induction-based techniques are efficient and quite powerful, their Achilles 
heel is the automation of the inductive argument. Indeed, this often becomes 
the limiting step in applications of induction-based techniques. Automating the 
induction step and expanding the class of array manipulating programs to which 
induction-based techniques can be applied forms the primary motivation for our 
work. Rather than being a stand-alone technique, we envisage our work being 
used as part of a portfolio of techniques in a modern program verification tool. 


© The Author(s) 2021 
A. Silva and K. R. M. Leino (Eds.): CAV 2021, LNCS 12760, pp. 911-935, 2021. 
https://doi.org/10.1007/978-3-030-81688-9_42 
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We propose a novel and practically efficient induction-based technique that 
advances the state-of-the-art in automating the inductive step when reasoning 
about array manipulating programs. This allows us to automatically verify inter- 
esting properties of a large class of array manipulating programs that are beyond 
the reach of state-of-the-art induction-based techniques, viz. [12,42]. The work 
that comes closest to us is VAJRA [12], which is part of the portfolio of tech- 
niques in VERIABS [1] — the winner of SV-COMP 2021 in the Arrays Reach 
sub-category. Our work addresses several key limitations of the technique imple- 
mented in VAJRA, thereby making it possible to analyze a much larger class of 
array manipulating programs than can be done by VERIABS. Significantly, this 
includes programs with nested loops that have hitherto been beyond the reach 
of automated techniques that use mathematical induction [12, 42,44]. 

A key innovation in our approach is the construction of two slightly differ- 
ent versions of a given program that have identical control flow structures but 
slightly different data operations. We automatically identify simple relations, 
called difference invariants, between corresponding variables in the two versions 
of a program at key control flow points. Interestingly, these relations often turn 
out to be significantly simpler than inductive invariants required to prove the 
property directly. This is not entirely surprising, since the difference invariants 
depend less on what individual statements in the programs are doing, and more 
on the difference between what they are doing in the two versions of the pro- 
gram. We show how the two versions of a given program can be automatically 
constructed, and how differences in individual statements can be analyzed to 
infer simple difference invariants. Finally, we show how these difference invari- 
ants can be used to simplify the reasoning in the inductive step of our technique. 

We consider programs with (possibly nested) loops manipulating arrays, 
where the size of each array is a symbolic integer parameter N (> 0)'. We 
verify (a sub-class of) quantified and quantifier-free properties that may depend 
on the symbolic parameter N. Like in [12], we view the verification problem as 
one of proving the validity of a parameterized Hoare triple {y(NV)} Py {W(N)} 
for all values of N (> 0), where arrays are of size N in the program Py, and N 
is a free variable in y(-) and 4%(-). 

To illustrate the kind of programs that are amenable to our technique, con- 
sider the program shown in Fig. 1(a), adapted from an SV-COMP benchmark. 
This program has a couple of sequentially composed loops that update arrays 
and scalars. The scalars S and F are initialized to 0 and 1 respectively before 
the first loop starts iterating. Subsequently, the first loop computes a recurrence 
in variable S and initializes elements of the array B to 1 if the corresponding 
elements of array A have non-negative values, and to 0 otherwise. The outermost 
branch condition in the body of the second loop evaluates to true only if the 
program parameter N and the variable S have same values. The value of F is 
reset based on some conditions depending on corresponding entries of arrays A 
and B. The pre-condition of this program is true; the post-condition asserts that 
F is never reset in the second loop. 


1 For a more general class of programs supported by our technique, please see [13]. 
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// assume(true) 
1.S$=0; F=1; 
2. for(i = 0; i< N; i++) { 
3. Basri n aoe aes 
E a ( aa E IBH] =at; 2. for(i=0; i<N; i++) Ali] = 0; 
N PO Taa T 3. for(j=0; j<N; j++) S=S+1; 
i i fe leat. g 4. for(k=0; k<N; k++) { 
e ma aa Be gee A 5. for(1=0; 1<N; 1++) ALM] = Al] + 1; 
Be aa l 6. ALk] = A[k] + S; 
9. if ( ALj] >= 0 && !B[j] ) F = 0; 74 
T a C ALJ] < 0 && BIj] ) F = 0; // assert(forall x in [0,N), A[x]==2*N) 
12.} 
// assert(F == 1) (b) 
(a) 


Fig. 1. Motivating examples 


State-of-the-art techniques find it difficult to prove the assertion in this pro- 
gram. Specifically, VAJRA [12] is unable to prove the property, since it cannot 
reason about the branch condition (in the second loop) whose value depends on 
the program parameter N. VERIABS [1], which employs a sequence of techniques 
such as loop shrinking, loop pruning, and inductive reasoning using [12] is also 
unable to verify the assertion shown in this program. Indeed, the loops in this 
program cannot be merged as the final value of S computed by the first loop 
is required in the second loop; hence loop shrinking does not help. Also, loop 
pruning does not work due to the complex dependencies in the program and the 
fact that the exact value of the recurrence variable S is required to verify the 
program. Subsequent abstractions and techniques applied by VERIABS from its 
portfolio are also unable to verify the given post-condition. VIAP [42] translates 
the program to a quantified first-order logic formula in the theory of equality 
and uninterpreted functions [32]. It applies a sequence of tactics to simplify and 
prove the generated formula. These tactics include computing closed forms of 
recurrences, induction over array indices and the like to prove the property. How- 
ever, its sequence of tactics is unable to verify this example within our time limit 
of 1 min. 

Benchmarks with nested loops are a long standing challenge for most veri- 
fiers. Consider the program shown in Fig. 1(b) with a nested loop in addition 
to sequentially composed loops. The first loop initializes entries in array A to 
0. The second loop aggregates a constant value in the scalar S. The third loop 
is a nested loop that updates array A based on the value of S. The entries of 
A are updated in the inner as well as outer loop. The property asserts that on 
termination, each array element equals twice the value of the parameter N. 

While the inductive reasoning of VAJRA and the tactics in VIAP do not sup- 
port nested loops, the sequence of techniques used by VERIABS is also unable to 
prove the given post-condition in this program. In sharp contrast, our prototype 
tool DIFFY is able to verify the assertions in both these programs automati- 
cally within a few seconds. This illustrates the power of the inductive technique 
proposed in this paper. 
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The technical contributions of the paper can be summarized as follows: 


— We present a novel technique based on mathematical induction to prove inter- 
esting properties of a class of programs that manipulate arrays. The crucial 
inductive step in our technique uses difference invariants from two slightly 
different versions of the same program, and differs significantly from other 
induction-based techniques proposed in the literature [11, 12,42,44]. 

— We describe algorithms to transform the input program for use in our induc- 
tive verification technique. We also present techniques to infer simple dif- 
ference invariants from the two slightly different program versions, and to 
complete the inductive step using these difference invariants. 

— We describe a prototype tool DIFFY that implements our algorithms. 

— We compare DIFFY vis-a-vis state-of-the-art tools for verification of C pro- 
grams that manipulate arrays on a large set of benchmarks. We demonstrate 
that DIFFyY significantly outperforms the winners of SV-COMP 2019, 2020 
and 2021 in the Array Reach sub-category. 


2 Overview and Relation to Earlier Work 


In this section, we provide an overview of the main ideas underlying our tech- 
nique. We also highlight how our technique differs from [12], which comes closest 
to our work. To keep the exposition simple, we consider the program Py, shown 
in the first column of Fig. 2, where N is a symbolic parameter denoting the sizes 
of arrays a and b. We assume that we are given a parameterized pre-condition 
(N), and our goal is to establish the parameterized post-condition w(N), for 
all N > 0. In [12,44], techniques based on mathematical induction (on N) were 
proposed to solve this class of problems. As with any induction-based technique, 
these approaches consist of three steps. First, they check if the base case holds, 
i.e. if the Hoare triple {py(N)} Pw {W(V)} holds for small values of N, say 
1 < N < M, for some M > 0. Next, they assume that the inductive hypoth- 
esis {y(N — 1)} Pw-1 {Y(N — 1)} holds for some N > M +1. Finally, in 
the inductive step, they show that if the inductive hypothesis holds, so does 
{y(NV)} Pw {4Y(N)}. It is not hard to see that the inductive step is the most 
crucial step in this style of reasoning. It is also often the limiting step, since not 
all programs and properties allow for efficient inferencing of {y(V)} Pw {Y(N)} 
from {g(N — 1)} Pu-i (W(N — 1)}. 

Like in [12,44], our technique uses induction on N to prove the Hoare triple 
{y(N)} Pw {4(N)} for all N > 0. Hence, our base case and inductive hypothesis 
are the same as those in [12,44]. However, our reasoning in the crucial inductive 
step is significantly different from that in [12,44], and this is where our primary 
contribution lies. As we show later, not only does this allow a much larger class of 
programs to be efficiently verified compared to [12,44], it also permits reasoning 
about classes of programs with nested loops, that are beyond the reach of [12,44]. 
Since the work of [12] significantly generalizes that of [44], henceforth, we only 
refer to [12] when talking of earlier work that uses induction on N. 

In order to better understand our contribution and its difference vis-a-vis the 
work of [12], a quick recap of the inductive step used in [12] is essential. The 
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// (N) = true 
x= 0; 


for(i=0; i<N; i++) 


bs ae as, a hh a 


x=0; 
for(i=0; i<N-1; i++) 


x=x + N*N; 
a{N-1] = a[N-1]+N; 


for(j=0; j<N-1; j++) 


x =0; 
for(i=0; i<N-1; i++) 


x=x+N‘N; D 
a{N-1] = a[N-1]+N; 


b[N-1] = x + N-1; 


b[N-1] = x + N-1; 


> peel(Py) 


915 


*= 0; J 


for(i=0; i<N-1; i++) 


-.---- narra 


for(i=0; i<N-1; i++) 7) 
x = x + 2*N-1; 
ali] = afi] + 1; 


A 


Py x=x + N*N; 
a[N-1] = a[N-1]+N; » oP 


for(k=0; k<N-1; k++) 
b[k] = b[k] + 
(N-1)*(2*N-1)+N*N; 


//40) = 
(vj. b[j] = j +N") 


b[N-1] = x + N-1; 


Fig. 2. Pictorial depiction of our program transformations 


inductive step in [12] crucially relies on finding a “difference program” ðP y and a 
“difference pre-condition” Oy(N) such that: (i) Py is semantically equivalent to 
Py_1;0Py, where ‘;’ denotes sequential composition of programs”, (ii) y(N) > 
p(N — 1) A ðp(N), and (iii) no variable/array element in Oy(N) is modified by 
Py-_i. As shown in [12], once Py and O(N) satisfying these conditions are 
obtained, the problem of proving {y(N)} Py {w(N)} can be reduced to that of 
proving {Y(N — 1) A Oy(N)} Pn {u(N)}. This approach can be very effective 
if (i) Pn is “simpler” (e.g. has fewer loops or strictly less deeply nested loops) 
than Py and can be computed efficiently, and (ii) a formula Oy(N) satisfying 
the conditions mentioned above exists and can be computed efficiently. 

The requirement of P y being semantically equivalent to Py_1; Py is a very 
stringent one, and finding such a program OP y is non-trivial in general. In fact, 
the authors of [12] simply provide a set of syntax-guided conditionally sound 
heuristics for computing OP y. Unfortunately, when these conditions are violated 
(we have found many simple programs where they are violated), there are no 
known algorithmic techniques to generate OP y in a sound manner. Even if a pro- 
gram OP y were to be found in an ad-hoc manner, it may be as “complex” as Py 
itself. This makes the approach of [12] ineffective for analyzing such programs. 
As an example, the fourth column of Fig. 2 shows Py—1 followed by one possible 
ðP y that ensures Py (shown in the first column of the same figure) is semanti- 
cally equivalent to Py_ 1; Py. Notice that P y in this example has two sequen- 
tially composed loops, just like Py had. In addition, the assignment statement in 
the body of the second loop uses a more complex expression than that present 
in the corresponding loop of Py. Proving {~(N — 1) A Oy(N)} OPw {Y(N)} 


? Although the authors of [12] mention that it suffices to find a Pw that satisfies 
{p(N)} Pn-1;0Pw {Y(N)}, they do not discuss any technique that takes (N) or 
w(N) into account when generating ôP y. 
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may therefore not be any simpler (perhaps even more difficult) than proving 
{o(N)} Pw (O(N)}. 

In addition to the difficulty of computing P y, it may be impossible to find 
a formula 0y(NV) such that y(N) > y(N-1) A O(N), as required by [12]. This 
can happen even for fairly routine pre-conditions, such as (N) = ( ie Ali] = 
N). Notice that there is no 0y(N) that satisfies p(N) > y(N — 1) A dy(N) in 
this case. In such cases, the technique of [12] cannot be used at all, even if Py, 
(N) and y(N) are such that there exists a trivial proof of {y(N)} Pw {w(N)}. 

The inductive step proposed in this paper largely mitigates the above prob- 
lems, thereby making it possible to efficiently reason about a much larger class 
of programs than that possible using the technique of [12]. Our inductive step 
proceeds as follows. Given Py, we first algorithmically construct two programs 
Qyn-1 and peel(P y), such that Py is semantically equivalent to Qy_1; peel(P y). 
Intuitively, Qy_— 1 is the same as Py, but with all loop bounds that depend on N 
now modified to depend on N — 1 instead. Note that this is different from Py_1, 
which is obtained by replacing all uses (not just in loop bounds) of N in Py by 
N — 1. As we will see, this simple difference makes the generation of peel(P y) 
significantly simpler than generation of OP y, as in [12]. While generating Qy_—1 
and peel( Py) may sound similar to generating Py_; and OP y [12], there are fun- 
damental differences between the two approaches. First, as noted above, Py_1 
is semantically different from Qy_1. Similarly, peel(P y) is also semantically dif- 
ferent from OPy. Second, we provide an algorithm for generating Qy_; and 
peel(Py) that works for a significantly larger class of programs than that for 
which the technique of [12] works. Specifically, our algorithm works for all pro- 
grams amenable to the technique of [12], and also for programs that violate 
the restrictions imposed by the grammar and conditional heuristics in [12]. For 
example, we can algorithmically generate Qy_ 1 and peel(P y) even for a class of 
programs with arbitrarily nested loops — a program feature explicitly disallowed 
by the grammar in [12]. Third, we guarantee that peel(Py) is “simpler” than 
Py in the sense that the maximum nesting depth of loops in peel(P y) is strictly 
less than that in Py. Thus, if Py has no nested loops (all programs amenable to 
analysis by [12] belong to this class), peel(P y) is guaranteed to be loop-free. As 
demonstrated by the fourth column of Fig. 2, no such guarantees can be given 
for OP generated by the technique of [12]. This is a significant difference, since 
it greatly simplifies the analysis of peel(P y) vis-a-vis that of ðP y. 

We had mentioned earlier that some pre-conditions y(N) do not admit any 
Op(N) such that y(N) > (N — 1) A ðp(N). It is, however, often easy to 
compute formulas y’(N—1) and Ay’(N) in such cases such that y(N) > p'(N— 
1) A Ay’(N), and the variables/array elements in Ay'(N) are not modified by 
either Py_—1 or Qy—1. For example, if we were to consider a (new) pre-condition 
(N) = A Ali] = N) for the program Py shown in the first column of 
Fig. 2, then we have y'(N — 1) = ( A3 Ald] = N) and Ay’(N) = (A[N - 1] = 
N). We assume the availability of such a y’(N — 1) and Ay’(N) for the given 
(N). This significantly relaxes the requirement on pre-conditions and allows a 
much larger class of Hoare triples to be proved using our technique vis-a-vis that 
of [12]. 
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The third column of Fig. 2 shows Qy_1 and peel(P y) generated by our algo- 
rithm for the program Py in the first column of the figure. It is illustrative to 
compare these with Py_; and P y shown in the fourth column of Fig. 2. Notice 
that Qy—1 has the same control flow structure as Py—1, but is not semanti- 
cally equivalent to Py_,. In fact, Qy_—; and Py_; may be viewed as closely 
related versions of the same program. Let Vo and Vp denote the set of vari- 
ables of Qy—1 and Py-_, respectively. We assume Vo is disjoint from Vp, and 
analyze the joint execution of Qy_ ; starting from a state satisfying the pre- 
condition y/(N — 1), and Py_, starting from a state satisfying (N — 1). The 
purpose of this analysis is to compute a difference predicate D(Va, Vp, N — 1) 
that relates corresponding variables in Qy_— ; and Py_, at the end of their joint 
execution. The above problem is reminiscent of (yet, different from) translation 
validation [4,17,24,40,46,48, 49], and indeed, our calculation of D(Vg, Vp, N —1) 
is motivated by techniques from the translation validation literature. An impor- 
tant finding of our study is that corresponding variables in Qy_—,; and Py_1 
are often related by simple expressions on N, regardless of the complexity of 
Py, (N) or (NV). Indeed, in all our experiments, we didn’t need to go beyond 
quadratic expressions on N to compute D(Va, Vp, N — 1). 

Once the steps described above are completed, we have Ay’(NV), peel(Py) 
and D(Vgo, Vp, N — 1). It can now be shown that if the inductive hypothesis, 
ie. {p(N — 1)} Py-i {Y(N — 1)} holds, then proving {y(NV)} Pw {4(N)} 
reduces to proving {Ay’(N) A Y'(N—1)} peel(Pw) {Y(N)}, where ~’(N-1) = 
IVP (Y(N — 1) A D(Va, Vp, N — 1)). A few points are worth emphasizing here. 
First, if D(Vq@, Vp, N — 1) is obtained as a set of equalities, the existential quan- 
tifier in the formula ~’(N — 1) can often be eliminated simply by substitu- 
tion. We can also use quantifier elimination capabilities of modern SMT solvers, 
viz. Z3 [39], to eliminate the quantifier, if needed. Second, recall that unlike 
OPw generated by the technique of [12], peel(Py) is guaranteed to be “sim- 
pler” than Py, and is indeed loop-free if Py has no nested loops. Therefore, 
proving {Ay’(N) A Y'(N —1)} peel(Py) {W(NV)} is typically significantly sim- 
pler than proving {~(N — 1) A Oy(N)} Pn {4Y(N)}. Finally, it may hap- 
pen that the pre-condition in {Ay’(N) A Y'(N — 1)} peel(Py) {Y(N)} is not 
strong enough to yield a proof of the Hoare triple. In such cases, we need to 
strengthen the existing pre-condition by a formula, say €’(N — 1), such that 
the strengthened pre-condition implies the weakest pre-condition of Y(N) under 
peel(Py). Having a simple structure for peel(Py) (e.g., loop-free for the entire 
class of programs for which [12] works) makes it significantly easier to com- 
pute the weakest pre-condition. Note that ¿'(N — 1) is defined over the vari- 
ables in Vg. In order to ensure that the inductive proof goes through, we need 
to strengthen the post-condition of the original program by €(V) such that 
E(N — 1) A D(Va, Vp, N — 1) => E(N — 1). Computing E(N — 1) requires a 
special form of logical abduction that ensures that €(N — 1) refers only to vari- 
ables in Vp. However, if D(Vq,Vp, N — 1) is given as a set of equalities (as 
is often the case), €(.N — 1) can be computed from Ẹ'(N — 1) simply by sub- 
stitution. This process of strengthening the pre-condition and post-condition 
may need to iterate a few times until a fixed point is reached, similar to what 
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happens in the inductive step of [12]. Note that the fixed point iterations may 
not always converge (verification is undecidable in general). However, in our 
experiments, convergence always happened within a few iterations. If €’(N — 1) 
denotes the formula obtained on reaching the fixed point, the final Hoare triple 
to be proved is {E(N — 1) A Ay’(N) A Y'(N — 1)} peel(Px) {E(N) A w(N)}, 
where Y'(N — 1) = IVP (Y(N — 1) A D(Va, Vp, N — 1)). Having a simple (often 
loop-free) peel(P y) significantly simplifies the above process. 

We conclude this section by giving an overview of how Qy_, and peel(P y) 
are computed for the program P y shown in the first column of Fig. 2. The second 
column of this figure shows the program obtained from Py by peeling the last 
iteration of each loop of the program. Clearly, the programs in the first and 
second columns are semantically equivalent. Since there are no nested loops in 
Py, the peels (shown in solid boxes) in the second column are loop-free program 
fragments. For each such peel, we identify variables/array elements modified in 
the peel and used in subsequent non-peeled parts of the program. For example, 
the variable x is modified in the peel of the first loop and used in the body 
of the second loop, as shown by the arrow in the second column of Fig. 2. We 
replace all such uses (if needed, transitively) by expressions on the right-hand 
side of assignments in the peel until no variable/array element modified in the 
peel is used in any subsequent non-peeled part of the program. Thus, the use of 
x in the body of the second loop is replaced by the expression x + N * N in the 
third column of Fig. 2. The peeled iteration of the first loop can now be moved 
to the end of the program, since the variables modified in this peel are no longer 
used in any subsequent non-peeled part of the program. Repeating the above 
steps for the peeled iteration of the second loop, we get the program shown in 
the third column of Fig. 2. This effectively gives a transformed program that 
can be divided into two parts: (i) a program Qy_1 that differs from Py only 
in that all loops are truncated to iterate N — 1 (instead of N) times, and (ii) a 
program peel(P y) that is obtained by concatenating the peels of loops in Py in 
the same order in which the loops appeared in Py. It is not hard to see that Py, 
shown in the first column of Fig. 2, is semantically equivalent to Qy-—1; peel(P y). 
Notice that the construction of Qy_—1 and peel(Py) was fairly straightforward, 
and did not require any complex reasoning. In sharp contrast, construction of 
OP y, as shown in the bottom half of fourth column of Fig. 2, requires non-trivial 
reasoning, and produces a program with two sequentially composed loops. 


3 Preliminaries and Notation 


We consider programs generated by the grammar shown below: 


PB ::= St 
St ::= St; St | v := E | AJE] := E | if(BoolE) then St else St | 
for (£ := 0; £ < UB; £ := 4+1) {St} 
E:=Eop E| AE] | v| £]|c| N 
op=+ -0y 
UB := UB op UB| 4| c| N 
BoolE ::= E relop E | BoolE AND BoolE | NOT BoolE | BoolE OR BoolE 
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Formally, we consider a program Py to be a tuple (V,£,A,PB,N), where V 
is a set of scalar variables, £ C V is a set of scalar loop counter variables, A 
is a set of array variables, PB is the program body, and N is a special symbol 
denoting a positive integer parameter of the program. In the grammar shown 
above, we assume that A € A, v E€ V\£L, L € £andc € Z. We also assume 
that each loop L has a unique loop counter variable ¢ that is initialized at the 
beginning of L and is incremented by 1 at the end of each iteration. We assume 
that the assignments in the body of L do not update @. For each loop L with 
termination condition £ < UB, we require that UB is an expression in terms of 
N, variables in £ representing loop counters of loops that nest L, and constants 
as shown in the grammar. Our grammar allows a large class of programs (with 
nested loops) to be analyzed using our technique, and that are beyond the reach 
of state-of-the-art tools like [1, 12,42]. 

We verify Hoare triples of the form {y(N)} Pw {(N)}, where the for- 
mulas y(N) and (N) are either universally quantified formulas of the form 
VI (a(I, N) = B(A,V,I,N)) or quantifier-free formulas of the form 7(A, V, N). 
In these formulas, I is a sequence of array index variables, a is a quantifier-free 
formula in the theory of arithmetic over integers, and ĝ and 7 are quantifier-free 
formulas in the combined theory of arrays and arithmetic over integers. 

For technical reasons, we rename all scalar and array variables in the program 
in a pre-processing step as follows. We rename each scalar variable using the well- 
known Static Single Assignment (SSA) [43] technique, such that the variable 
is written at (at most) one location in the program. We also rename arrays 
in the program such that each loop updates its own version of an array and 
multiple writes to an array element within the same loop are performed on 
different versions of that array. We use techniques for array SSA [30] renaming 
studied earlier in the context of compilers, for this purpose. In the subsequent 
exposition, we assume that scalar and array variables in the program are already 
SSA renamed, and that all array and scalar variables referred to in the pre- and 
post-conditions are also expressed in terms of SSA renamed arrays and scalars. 


4 Verification Using Difference Invariants 


The key steps in the application of our technique, as discussed in Sect. 2, are 


Al: Generation of Qy—1 and peel(Py) from a given Py. 

A2: Generation of y’(N — 1) and Ay’(NV) from a given (N). 

A3: Generation of the difference invariant D(Vq,Vp,N — 1), given y(N — 1), 
gy! (N — 1), Qn-1 and Py_1.- 

A4: Proving {Ay'(N) A IVp(Y(N —1) A D(Va, Ve, N —1))} peel(Px) {Y(N)}, 
possibly by generation of ¿'(N — 1) and (N) to strengthen the pre- and 
post-conditions, respectively. 


We now discuss techniques for solving each of these sub-problems. 
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4.1 Generating Qy_1 and peel(Py) 


The procedure illustrated in Fig.2 (going from the first column to the third 
column) is fairly straightforward if none of the loops have any nested loops 
within them. It is easy to extend this to arbitrary sequential compositions of 
non-nested loops. Having all variables and arrays in SSA-renamed forms makes 
it particularly easy to carry out the substitution exemplified by the arrow shown 
in the second column of Fig. 2. Hence, we don’t discuss any further the generation 
of Qy—1 and peel(Py) when all loops are non-nested. 

gents of sae loops is, P for(;=0; <N; ++) 
ever, challenging and requires addi- 
tional pei Before we present WA, VMI, Bl 
an algorithm for handling this case, for(l2=0; l2<N; f2++) 
we discuss the intuition using an Lr) Le (Oe 
abstract example. Consider a pair of 
nested loops, L4 and Lg, as shown in o B3 
Fig. 3. Suppose that B1 and B3 are 
loop-free code fragments in the body 
of Lı that precede and succeed the 
nested loop Lz. Suppose further that the loop body, B2, of La is loop-free. To 
focus on the key aspects of computing peels of nested loops, we make two sim- 
plifying assumptions: (i) no scalar variable or array element modified in B2 is 
used subsequently (including transitively) in either B3 or B1, and (ii) every scalar 
variable or array element that is modified in B1 and used subsequently in B2, is 
not modified again in either B1, B2 or B3. Note that these assumptions are made 
primarily to simplify the exposition. For a detailed discussion on how our tech- 
nique can be used even with some relaxations of these assumptions, the reader 
is referred to [13]. The peel of the abstract loops Lı and Lə is as shown in Fig. 4. 
The first loop in the peel includes the last iteration of Lz in each of the N — 1 
iterations of L4, that was missed in Qy_— 1. The subsequent code includes the last 
iteration of Lı that was missed in Qy_1. 

Formally, we use the notation L;(N) to for(l,=0; A<N — 1; &++) 


denote a loop Lı that has no nested loops BO 
within it, and its loop counter, say 4, CUT 
increases from 0 to an upper bound that is CZ LL 
given by an expression in NV. Similarly, we use MLL 

Lı (N, Le(N)) to denote a loop Lı that has for(l2=0; l2<N; l2++) 


another loop La nested within it. The loop U B2 
counter ¢; of Lı increases from 0 to an upper l 
bound expression in N, while the loop counter [ŘS B3 

fz of La increases from 0 to an upper bound 

expression in £; and N. Using this notation, Fig. 4. Peel of the nested loop 
Lı (N, Lə (N, L3(N))) represents three nested 

loops, and so on. Notice that the upper bound expression for a nested loop can 
depend not only on N but also on the loop counters of other loops nesting it. 
For notational clarity, we also use LPeel(L;, a, b) to denote the peel of loop L; 


Fig. 3. A generic nested loop 
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consisting of all iterations of L; where the value of 4; ranges from a to b-1, both 
inclusive. Note that if b-a is a constant, this corresponds to the concatenation 
of (b-a) peels of L;. 
We will now try to see how we CAN or (1=0; 0:<UL, (N-1); C++) 
implement the transformation from the LPeel(L2, UL, (€1,N-1), Ui, (41,)) 
first column to the second column of ppee1(Li, Un, (N-1), Ui, (ND) 
Fig. 2 for a nested loop Lı (N, L2(N)). 
The first step is to truncate all loops Fig. 5. Peel of Lı (N, L2(N)) 
to use N — 1 instead of N in the upper 
bound expressions. Using the notation introduced above, this gives the loop 
Lı (N-1, Lo(N-1)). Note that all uses of N other than in loop upper bound 
expressions stay unchanged as we go from Lı (N, L2(N)) to Ly (N-1, Le(N-1)). 
We now ask: Which are the loop iterations of Lı (N, La (N) ) that have been missed 
(or skipped) in going to Ly (N-1, Lə (N-1))? Let the upper bound expression of 
Lı in Li (N, L2(N)) be UL, (N), and that of Ly be UL, (41, N). It is not hard to 
see that in every iteration 44 of Lı, where 0 < 41 < UL, (N — 1), the iterations 
corresponding to l2 € {UL, (41, N —1),...,UL, (4, N) — 1} have been missed. In 
addition, all iterations of Lı corresponding to 41 € {UL,(N—1),...,UL,(N)-1} 
have also been missed. This implies that the “peel” of Lı (N, Lo (N)) must 
include all the above missed iterations. This peel therefore is the program frag- 
ment shown in Fig. 5. 

Notice that if UL, (Ay »N) for(l;=0; <UL, (N-1); &4++) { 
= UL (41 ,N-1) is a constant for(l2=0; <UL, (l1, N-1); fo++) 
(as is the case if UL, (£1,N) is LPeel(L3, Ui, (l1, l2,N-1), Ut, (l1, l2,N)) 
any linear function of 44 and LPeel(L2, Ui, (41,N-1), UL, (41,N)) 
N), then the peel does not have } 
any loop with nesting depth 2. LPeel(Li, Ui, (N-1), UL, (N)) 
Hence, the maximum nesting 
depth of loops in the peel is Fig. 6. Peel of Li(N, Lo(N, L3(N))) 
strictly less than that in Lı (N, 
Lə(N)), yielding a peel that is “simpler” than the original program. This argu- 
ment can be easily generalized to loops with arbitrarily large nesting depths. 
The peel of Lı (N, Lo(N, L3(N))) is as shown in Fig. 6. 

As an illustrative example, for(i=0; i<N; i++) for(i=0; i<N-1; i++) 
let us consider the program in for(j=0; j<N; j++) ALA] (N-1] = N; 


Fig. 7(a), and suppose we wish ALi] Cj] = N; for(j=0; j<N; j++) 
to compute the peel of this pro- A[N-1] [j] = N; 
gram containing nested loops. (a) (b) 

In this case, the upper bounds 

of the loops are UL (N) = Fig. 7. (a) Nested Loop & (b) Peel 


UL, (N) = N. The peel is shown 

in Fig. 7(b) and consists of two sequentially composed non-nested loops. The 
first loop takes into account the missed iterations of the inner loop (a single 
iteration in this example) that are executed in Py but are missed in Qny_1. The 
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Algorithm 1. GENQANDPEEL(Py: program) 


1: Let sequentially composed loops in Py be in the order L1, L2, ..., Lm; 

2: for each loop L; E€ ToPLEvELLOops(Py) do 

3: (QL; ; RL; ) +— GENQANDPEELFoRLOoP(L;); 

4: while Sv.use(v) € QL, A def(v) € RL}, for some 1 < j < i < N do > v is var/array element 
5: Substitute rhs expression for v from Ri; in QL; > If Ri; is a loop, abort 
6: Qn—1 = Qhi; Qla; -i Qin; 

7: peel(Px) — Ri; Rea; -ei Remi 

8: return (Qn-1, peel(P y )); 

9: procedure GENQANDPEELFoRLOOP(L: loop) 

10: Let U_(V) be the UB expression of loop L; 

11: Qı — L with N — 1 substituted for N in all UB expressions (including for nested loops); 
1:2: if L has subloops then 

13: t — nesting depth of inner-most nested loop in L; 

14: Rii1 — empty program with no statements; 

15: for k = t; k > 2; k-- do 

16: for each subloop SL; in L; at nesting depth k do > Ordered SLi, SL2,..., SL; 
17: Rsu, — LPeel(SLj, Ust; (£i, .--;lk-1;, N —1), Ust,; (41, Met »£n-1, N)); 

18: Ry + for (i=0; i<UL p CN — 1); i++) { Rk41;Rsz1 RSL iRszj y; 

19: RL — Ro; LPeel(L, UL(N — 1), UL(N)); 

20: else 

21: R — LPeel(L, UL (N — 1), UL(N)); 


22: return (QL, RL); 


second loop takes into account the missed iterations of the outer loop in Qy-1 
compared to Py. 

Generalizing the above intuition, Algorithm 1 presents function GENQAND- 
PEEL for computing Qyn-—ı and peel(Py) for a given Py that has sequentially 
composed loops with potentially nested loops. Due to the grammar of our pro- 
grams, our loops are well nested. The method works by traversing over the 
structure of loops in the program. In this algorithm Q,, and R,, represent the 
counterparts of Qy—1 and peel(Py,) for loop L;. We create the program Qny—1 
by peeling each loop in the program and then propagating these peels across 
subsequent loops. We identify the missed iterations of each loop in the pro- 
gram Py from the upper bound expression UB. Recall that the upper bound 
of each loop L; at nesting depth k, denoted by U,, is in terms of the loop 
counters l of outer loops and the program parameter N. We need to peel 
U1, (41, €2,---,£n-1, N) — UL, (41, l2,- - -, lk-1, N — 1) number of iterations from 
each loop, where 0; < f2 < ... < €,-1 are counters of the outer nesting loops. 
As discussed above, whenever this difference is a constant value, we are guaran- 
teed that the loop nesting depth reduces by one. It may so happen that there 
are multiple sequentially composed loops SL; at nesting depth k and not just 
a single loop Lẹ. At line 2, we iterate over top level loops and call function 
GENQANDPEELFORLOOP(L;) for each sequentially composed loop L; in Py. At 
line 11 we construct Q, for loop L. If the loop L has no nested loops, then the 
peel is the last iterations computed using the upper bound in line 21 For nested 
loops, the loop at line 15 builds the peel for all loops inside L following the above 
intuition. The peels of all sub-loops are collected and inserted in the peel of L 
at line 19. Since all the peeled iterations are moved after Qı of each loop, we 
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need to repair expressions appearing in QL. The repairs are applied by the loop 
at line 4. In the repair step, we identify the right hand side expressions for all 
the variables and array elements assigned in the peeled iterations. Subsequently, 
the uses of the variables and arrays in Q,, that are assigned in R}, are replaced 
with the assigned expressions whenever j < i. If RL; is a loop, this step is more 
involved and hence currently not considered. Finally at line 8, the peels and Qs 
of all top level loops are stitched and returned. 

Note that lines 4 and 5 of Algorithm 1 implement the substitution repre- 
sented by the arrow in the second column of Fig. 2. This is necessary in order to 
move the peel of a loop to the end of the program. If either of the loops L; or 
L; use array elements as index to other arrays then it can be difficult to identify 
what expression to use in Q,, for the substitution. However, such scenarios are 
observed less often, and hence, they hardly impact the effectiveness of the tech- 
nique on programs seen in practice. The peel R,,, from which the expression to 
be substituted in Q}, has to be taken, itself may have a loop. In such cases, it 
can be significantly more challenging to identify what expression to use in Q1,. 
We use several optimizations to transform the peeled loop before trying to iden- 
tify such an expression. If the modified values in the peel can be summarized 
as closed form expressions, then we can replace the loop in the peel with its 
summary. For example, consider the peeled loop, for ( 41 =0; 4 < N; 4 ++) { 
S =S + 1; }. This loop is summarized as S = S + N; before it can be moved 
across subsequent code. If the variables modified in the peel of a nested loop are 
not used later, then the peel can be trivially moved. In many cases, the loop in 
the peel can also be substituted with its conservative over-approximation. We 
have implemented some of these optimizations in our tool and are able to verify 
several benchmarks with sequentially composed nested loops. It may not always 
be possible to move the peel of a nested loop across subsequent loops but we have 
observed that these optimizations suffice for many programs seen in practice. 


Theorem 1. Let Qy_1 and peel(Py) be generated by application of function 
GENQANDPEEL from Algorithm 1 on program Pn. Then Pn is semantically 
equivalent to Qny—1; peel(P y). 


Lemma 1. Suppose the following conditions hold; 


- Program Py satisfies our syntactic restrictions (see Sect. 3). 
— The upper bound expressions of all loops are linear expressions in N and in 
the loop counters of outer nesting loops. 


Then, the max nesting depth of loops in peel(P y) is strictly less than that in Py. 


Proof. Let U,,(41,..-,€k-1,N) be the upper bound expression of a loop 
Lg at nesting depth k. Suppose UL, = c.€) + -+:Cy-1-€p-1 + CN + 
D, where c1,...Ck-1,C and D are constants. Then UL,(41,...,€%-1,N) — 
U1, (41,-.-€k-1, N — 1) = C, i.e. a constant. Now, recalling the discussion in 
Sect.4.1, we see that LPeel(Lz, Up (¢1,...,0n-1,N—1), Uk (h, ...,lk-1,N)) 
simply results in concatenating a constant number of peels of the loop Ly. Hence, 
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the maximum nesting depth of loops in LPeel( Lz, Up (41,...,bk-1,N— 1), 
Up (li, ...,lk-1,N)) is strictly less than the maximum nesting depth of loops 
in Lg. 

Suppose loop L with nested loops (having maximum nesting depth t) is passed 
as the argument of function GENQANDPEELFORLOOP (see Algorithm 1). In 
line 15 of function GENQANDPEELFORLOOP, we iterate over all loops at nesting 
depth 2 and above within L. Let Ly; be a loop at nesting depth k, where2 < k <t. 
Clearly, Lz, can have at most t — k nested levels of loops within it. Therefore, 
when LPeel is invoked on such a loop, the maximum nesting depth of loops 
in the peel generated for Lẹ can be at most t — k — 1. From lines 18 and 19 
of function GENQANDPEELFORLOOP, we also know that this LPeel can itself 
appear at nesting depth k of the overall peel RL. Hence, the maximum nesting 
depth of loops in RL can be t— k — 1 + k, i.e. t — 1. This is strictly less than the 
maximum nesting depth of loops in L. 


Corollary 1. If Py has no nested loops, then peel(Py) is loop-free. 


4.2 Generating y’(N — 1) and Ay’(N) 

Given (N), we check if it is of the form ee pi, Where p; is a formula on 
the it” elements of one or more arrays, and scalars used in Py. If so, we infer 
y’(N — 1) to be Aa pi and Ay'(N) to be py- (assuming variables/array 
elements in py—1 are not modified by Qy_1). Note that all uses of N in p; are 
retained as is (i.e. not changed to N — 1) in y’(N — 1). In general, when deriving 
y'(N — 1), we do not replace any use of N in y(N) by N — 1 unless it is the 
limit of an iterated conjunct as discussed above. Specifically, if y(N) doesn’t 
contain an iterated conjunct as above, then we consider y’(N — 1) to be the 
same as y(N) and Ay’(N) to be True. Thus, our generation of y’(N — 1) and 
Ay’'(N) differs from that of [12]. As discussed earlier, this makes it possible to 
reason about a much larger class of pre-conditions than that admissible by the 
technique of [12]. 


4.3 Inferring Inductive Difference Invariants 


Once we have Py_1, Qu-1, p(N—1) and y’(N—1), we infer difference invariants. 
We construct the standard cross-product of programs Qy_—1 and Py_1, denoted 
as Qny_—1 x Py-1, and infer difference invariants at key control points. Note that 
Py_— 1 and Qy_ are guaranteed to have synchronized iterations of correspond- 
ing loops (both are obtained by restricting the upper bounds of all loops to use 
N — 1 instead of N). However, the conditional statements within the loop body 
may not be synchronized. Thus, whenever we can infer that the corresponding 
conditions are equivalent, we synchronize the branches of the conditional state- 
ment. Otherwise, we consider all four possibilities of the branch conditions. It 
can be seen that the net effect of the cross-product is executing the programs 
Py— and Qy-_ 1 one after the other. 
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We run a dataflow analysis pass over the constructed product graph to infer 
difference invariants at loop head, loop exit and at each branch condition. The 
only dataflow values of interest are differences between corresponding variables 
in Qy_ 1 and Py_y. Indeed, since structure and variables of Qy_; and Py_ are 
similar, we can create the correspondence map between the variables. We start 
the difference invariant generation by considering relations between correspond- 
ing variables/array elements appearing in pre-conditions of the two programs. 
We apply static analysis that can track equality expressions (including disjunc- 
tions over equality expressions) over variables as we traverse the program. These 
equality expressions are our difference invariants. 

We observed in our experiments the most of the inferred equality expressions 
are simple expressions of N (atmost quadratic in N). This not totally surprising 
and similar observations have also been independently made in [4,15,24]. Note 
that the difference invariants may not always be equalities. We can easily extend 
our analysis to learn inequalities using interval domains in static analysis. We 
can also use a library of expressions to infer difference invariants using a guess- 
and-check framework. Moreover, guessing difference invariants can be easy as 
in many cases the difference expressions may be independent of the program 
constructs, for example, the equality expression v = v’ where v € Py_, and 
v’ € Qy_1 does not depend on any other variable from the two programs. 

For the example in Fig.2, the difference invariant at the head of the 
first loop of Qn-1 X Pn-1 is D(Vq,Vp,N — 1) = (x/-x=ix(2xN-1) 
A Vi € [0,N — 1), a'i] — ali] = 1), where x,a € Vp and x’,a’ € Va. Given 
this, we easily get x’— x = (N — 1) x (2x N— 1) when the first loop termi- 
nates. For the second loop, D(Vq, Vp, N — 1) = (Vj € [0,N — 1), b’[j] — bij] = 
(x/—x) +N? = (N — 1)x (2 x N— 1) + N?). 

Note that the difference invariants and its computation are agnostic of the 
given post-condition. Hence, our technique does not need to re-run this analysis 
for proving a different post-condition for the same program. 


4.4 Verification Using Inductive Difference Invariants 


We present our method DiFFy for verification of programs using inductive dif- 
ference invariants in Algorithm 2. It takes a Hoare triple {y(V)} Pu {4Y(N)} 
as input, where y(V) and w(N) are pre- and post-condition formulas. We check 
the base in line 1 to verify the Hoare triple for N = 1. If this check fails, 
we report a counterexample. Subsequently, we compute Qj; and peel(Py) as 
described in Sect. 4.1 using the function GENQANDPEEL from Algorithm 1. At 
line 4, we compute the formulas y’(N — 1) and Ay’(N) as described in Sect. 4.2. 
For automation, we analyze the quantifiers appearing in y(N) and modify the 
quantifier ranges such that the conditions in Sect.4.2 hold. We infer difference 
invariants D(Vq, Vp, N — 1) on line 5 using the method described in Sect. 4.3, 
wherein Vg and Vp are sets of variables from Qy_—1 and Py-_y respectively. 
At line 6, we compute w’(N — 1) by eliminating variables Vp from Pyy_—, from 
P(N — 1) A D(Va, Vp, N — 1). At line 7, we check the inductive step of our anal- 
ysis. If the inductive step succeeds, then we conclude that the assertion holds. 
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Algorithm 2. Dirry( {y(N)} Py {W(N)} ) 


1: if {y(1)} Pi {(1)} fails then > Base case for N=1 
z return “Counterexample found!” ; 


3: (Qn—1, peel(P)) — GENQANDPEEL(Py); 

4: (y'(N — 1), Ay’(N)) — FormutaDirF(y(N)); > p(N) => p'(N — 1) A Ay’ (N) 
5: D(Vq, Vp, N — 1) — InFERDIFFINvs(Qn-1,PN-1, (N — 1), (N — 1)); 

6: Y'(N — 1) — QE(Vp, Y(N — 1) A D(Va, Ve, N — 1)); 

T: i£ {Y (N — 1) A Ay’(N)} peel(Px) {W(N)} then 

8: return True; > Verification Successful 
9: else 
10: return STRENGTHEN(Py, peel(P N), p(N), Y(N), Y (N — 1), Ay’ (N), D(Va, Ve, N)); 


11: procedure STRENGTHEN(Py, peel(Px), y(N), Y(N), Y'(N — 1), Ay’ (N), D(Va, Ve, N)) 
Es x(N) — ¥(N); 


13: E(N) — True; 
14: E(N — 1) — True; 
15: repeat 
16: x/(N — 1) — WP(x(N), peel(P y )); > Dijkstra’s WP for loop free code 
17: if x/(N—1)=90 then 
18: if peel(Py) has a loop then 
an return Dirry({€/(N — 1) A Ay’(N) A w'(N — 1)} peel(Px) {E(N) A Y(N)}); 
H else 
21; return False; > Unable to prove 
22: X(N) — QE(Va, x (N) A D(Va, Ve, N)); 
23: ECN) — E(N) A X(N); 
24: é'(N— 1) + &'(N—-1)Ax/(N —- 1); 
25: if {y(1)} Pi {€(1)} fails then 
26: return False; > Unable to prove 
27: if {E(N —1) A Ay’(N) A W'(N — 1)} peel(PN) {E(N) A Y(N)} holds then 
28: return True; > Verification Successful 
29: until timeout; 
30: return False; 


If that is not the case then, we try to iteratively strengthen both the pre- and 
post-condition of peel(P y) simultaneously by invoking STRENGTHEN. 

The function STRENGTHEN first initializes the formula x(N) with w(NV) and 
the formulas (N) and ¿(N — 1) to True. To strengthen the pre-condition of 
peel(Py), we infer a formula x’(N — 1) using Dijkstra’s weakest pre-condition 
computation of x(N) over the peel(Py) in line 16. It may happen that we are 
unable to infer such a formula. In such a case, if the program peel(Py) has 
loops then we recursively invoke DIFFY at line 19 to further simplify the pro- 
gram. Otherwise, we abandon the verification effort (line 21). We use quantifier 
elimination to infer x(N — 1) from x/(N — 1) and D(Vg, Vp, N — 1)) at line 6. 

The inferred pre-conditions x(N) and x/(N—1) are accumulated in (N) and 
€'(N — 1), which strengthen the post-conditions of Py and Qy-1 respectively in 
lines 23-24. We again check the base case for the inferred formulas in €(N) at 
line 25. If the check fails we abandon the verification attempt at line 26. If the 
base case succeeds, we then proceed to the inductive step. When the inductive 
step succeeds, we conclude that the assertion is verified. Otherwise, we continue 
in the loop and try to infer more pre-conditions untill we run out of time. 

The pre-condition in Fig. 2 is (N) = True and the post-condition is ~(N) = 
Yj € [0,N), b[j] = j + N°). At line 4, ¢/(N — 1) and Ad/(N — 1) are computed 
to be True. D(Va, Vp, N — 1) is the formula computed in Sect. 4.3. At line 6, 
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Table 1. Summary of the experimental results. S is successful result. U is inconclusive 
result. TO is timeout. 


PROGRAM DIFFY VAJRA | VERIABS | VIAP 

CATEGORY S |U|TO|'S |U IS |TO |S |U |TO 
Safe C1 110/110/0 O 110} 0| 96| 14 | 16| 1| 93 
Safe C2 24| 21/0 3 0| 24 5| 19 4 0f 20 
Safe C3 23| 20/3 0 0| 23| 9| 14 0j 23| 0 
Total 157 /151|3 3 110) 47|110| 47 | 20| 24|113 
Unsafe C1| 99| 98/1 0 98 84) 15 | 98) 0| 1 
Unsafe C2) 24) 24/0 0 17 7| 19 5 | 22 0 2 
Unsafe C3| 23| 20/3 0 0| 23| 22| 1 0} 23| 0 
Total 146 | 142|4 0 115) 31|125| 21 |120| 23 3 

y'(N-1) = (Vj € [0,N — 1), b'[j] = j + (N — 1)? + (N-— 1) x (2 x N- 1) +N? = 


j +N°). The algortihm then invokes STRENGTHEN at line 10 which infers the 

formulas y/(N — 1) = (x’ = (N — 1)°) at line 16 and y(N) = (x = N°) at line 22. 

These are accumulated in €’(N — 1) and €(N), simultaneosuly strengthening the 

pre- and post-condition. Verification succeeds after this strengthening iteration. 
The following theorem guarantees the soundness of our technique. 


Theorem 2. Suppose there exist formulas €'(.N) and E(N) and an integer M > 
0 such that the following hold 


- {y(N)} Pw {W(N) A E(N)} holds for1 < N < M, for some M > 0. 

~ E(N) A D(Vq, Vp, N) = E(N) for all N > 0. 

- {E(N — 1) A Ay'(N) A (N — 1)} peel(PN) E(N) A Y(N)} holds for all 
N > M, where Y'(N — 1) = IVp(Y(N —1) A D(Vg, Ve, N — 1)). 


Then {p(N)} Pw {Y(N)} holds for all N > 0. 


5 Experimental Evaluation 


We have instantiated our technique in a prototype tool called DIFFY. It is written 
in C++ and is built using the LLVM (v6.0.0) [31] compiler. We use the SMT solver 
Z3(v4.8.7) [39] for proving Hoare triples of loop-free programs. DIFFY and the 
supporting data to replicate the experiments are openly available at [14]. 


Setup. All experiments were performed on a machine with Intel i7-6500U CPU, 
16GB RAM, running at 2.5GHz, and Ubuntu 18.04.5 LTS operating system. 
We have compared the results obtained from DirFy with VAJRA(v1.0) [12], 
VIAP(v1.1) [42] and VerRtABs(v1.4.1-12) [1]. We choose VAJRA which also 
employs inductive reasoning for proving array programs and verify the bench- 
marks in its test-suite. We compared with VERIABS as it is the winner of the 
arrays sub-category in SV-COMP 2020 [6] and 2021 [7]. VERIABS applies a 
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Fig. 8. Cactus Plots (a) All Safe Benchmarks (b) All Unsafe Benchmarks 


sequence of techniques from its portfolio to verify array programs. We compared 
with VIAP which was the winner in arrays sub-category in SV-COMP 2019 [5]. 
VIAP also employs a sequence of tactics, implemented for proving a variety of 
array programs. DIFFY does not use multiple techniques, however we choose to 
compare it with these portfolio verifiers to show that it performs well on a class 
of programs and can be a part of their portfolio. All tools take C programs in the 
SV-COMP format as input. Timeout of 60s was set for each tool. A summary 
of the results is presented in Table 1. 


Benchmarks. We have evaluated DIFFY on a set of 303 array benchmarks, 
comprising of the entire test-suite of [12], enhanced with challenging benchmarks 
to test the efficacy of our approach. These benchmarks take a symbolic parameter 
N which specifies the size of each array. Assertions are (in-)equalities over array 
elements, scalars and (non-)linear polynomial terms over N. We have divided 
both the safe and unsafe benchmarks in three categories. Benchmarks in Cl 
category have standard array operations such as min, max, init, copy, compare 
as well as benchmarks that compute polynomials. In these benchmarks, branch 
conditions are not affected by the value of N, operations such as modulo and 
nested loops are not present. There are 110 safe and 99 unsafe programs in the 
C1 category in our test-suite. In C2 category, the branch conditions are affected 
by change in the program parameter N and operations such as modulo are used 
in these benchmarks. These benchmarks do not have nested loops in them. There 
are 24 safe and unsafe benchmarks in the C2 category. Benchmarks in category 
C3 are programs with atleast one nested loop in them. There are 23 safe and 
unsafe programs in category C3 in our test-suite. The test-suite has a total of 
157 safe and 146 unsafe programs. 


Analysis. DIFFY verified 151 safe benchmarks, compared to 110 verified by 
VAJRA as well as VERIABS and 20 verified by VIAP. DIFFY was unable to 
verify 6 safe benchmarks. In 3 cases, the smt solver timed out while trying to 
prove the induction step since the formulated query had a modulus operation 
and in 3 cases it was unable to compute the predicates needed to prove the 
assertions. VAJRA was unable to verify 47 programs from categories C2 and 
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Fig. 9. Cactus plots (a) Safe C1 benchmarks (b) Unsafe C1 benchmarks 


C3. These are programs with nested loops, branch conditions affected by N, 
and cases where it could not compute the difference program. The sequence 
of techniques employed by VERIABS, ran out of time on 47 programs while 
trying to prove the given assertion. VERIABS proved 2 benchmarks in category 
C2 and 3 benchmarks in category C3 where DIFFY was inconclusive or timed 
out. VERIABS spends considerable amount of time on different techniques in its 
portfolio before it resorts to VAJRA and hence it could not verify 14 programs 
that VAJRA was able to prove efficiently. VIAP was inconclusive on 24 programs 
which had nested loops or constructs that could not be handled by the tool. It 
ran out of time on 113 benchmarks as the initial tactics in its sequence took up 
the allotted time but could not verify the benchmarks. DIFFY was able to verify 
all programs that VIAP and VAJRA were able to verify within the specified time 
limit. 

The cactus plot in Fig. 8(a) shows the performance of each tool on all safe 
benchmarks. DIFFyY was able to prove most of the programs within three sec- 
onds. The cactus plot in Fig.9(a) shows the performance of each tool on safe 
benchmarks in C1 category. VAJRA and DIFFY perform equally well in the C1 
category. This is due to the fact that both tools perform efficient inductive rea- 
soning. DIFFY outperforms VERIABS and VIAP in this category. The cactus 
plot in Fig. 10(a) shows the performance of each tool on safe benchmarks in the 
combined categories C2 and C3, that are difficult for VAJRA as most of these 
programs are not within its scope. DIFFY out performs all other tools in cate- 
gories C2 and C3. VERIABS was an order of magnitude slower on programs it 
was able to verify, as compared to DIFFY. VERIABS spends significant amount 
of time in trying techniques from its portfolio, including VAJRA, before one of 
them succeeds in verifying the assertion or takes up the entire time allotted to it. 
VIAP took 70 seconds more on an average as compared to DIFFy to verify the 
given benchmark. VIAP also spends a large portion of time in trying different 
tactics implemented in the tool and solving the recurrence relations in programs. 

Our technique reports property violations when the base case of the analy- 
sis fails for small fixed values of N. While the focus of our work is on proving 
assertions, we report results on unsafe versions of the safe benchmarks from our 
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test-suite. DIFFY was able to detect a property violation in 142 unsafe programs 
and was inconclusive on 4 benchmarks. VAJRA detected violations in 115 pro- 
grams and was inconclusive on 31 programs. VERIABS reported 125 programs as 
unsafe and ran out of time on 21 programs. VIAP reported property violation 
in 120 programs, was inconclusive on 23 programs and timed out on 3 programs. 

The cactus plot in Fig. 8(b) shows the performance of each tool on all unsafe 
benchmarks. DIFFY was able to detect a violation faster than all other tools and 
on more benchmarks from the test-suite. Figure 9(b) and Fig. 10(b) give a finer 
glimpse of the performance of these tools on the categories that we have defined. 
In the C1 category, DIFFY and VAJRA have comparable performance and DIFFY 
disproves the same number of benchmarks as VAJRA and VIAP. In C2 and C3 
categories, we are able to detect property violations in more benchmarks than 
other tools in less time. 

To observe any changes in the performance of these, we also ran them with an 
increased time out of 100 seconds (Fig. 11). Performance remains unchanged for 
DIFFY, VAJRA and VERIABS on both safe and unsafe benchmarks, and of VIAP 
on unsafe benchmarks. VIAP was able to additionally verify 89 safe programs 
in categories C1 and C2 with the increased time limit. 
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Fig. 11. Cactus plots. TO = 100s. (a) Safe benchmarks (b) Unsafe benchmarks 
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6 Related Work 


Techniques Based on Induction. Our work is related to several efforts that apply 
inductive reasoning to verify properties of array programs. Our work subsumes 
the full-program induction technique in [12] that works by inducting on the 
entire program via a program parameter N. We propose a principled method 
for computation and use of difference invariants, instead of computing difference 
programs which is more challenging. An approach to construct safety proofs 
by automatically synthesizing squeezing functions that shrink program traces is 
proposed in [27]. Such functions are not easy to synthesize, whereas difference 
invariants are relatively easy to infer. In [11], the post-condition is inductively 
established by identifying a tiling relation between the loop counter and array 
indices used in the program. Our technique can verify programs from [11], when 
supplied with the tiling relation. [44] identifies recurrent program fragments for 
induction using the loop counter. They require restrictive data dependencies, 
called commutativity of statements, to move peeled iterations across subsequent 
loops. Unfortunately, these restrictions are not satisfied by a large class of pro- 
grams in practice, where our technique succeeds. 


Difference Computation. Computing differences of program expressions has been 
studied for incremental computation of expensive expressions [35,41], optimizing 
programs with arrays [34], and checking data-structure invariants [45]. These 
differences are not always well suited for verifying properties, in contrast with 
the difference invariants which enable inductive reasoning in our case. 


Logic Based Reasoning. In [21], trace logic that implicitly captures inductive 
loop invariants is described. They use theorem provers to introduce and prove 
lemmas at arbitrary control locations in the program. Unlike their technique, we 
focus primarily on universally quantified and quantifier-free properties, although 
a restricted class of existentially quantified properties can be handled by our 
technique (see [13] for more details). VIAP [42] translates the program to an 
quantified first-order logic formula using the scheme proposed in [32]. It uses 
a portfolio of tactics to simplify and prove the generated formulas. Dedicated 
solvers for recurrences are used whereas our technique adapts induction for han- 
dling recurrences. 


Invariant Generation. Several techniques generate invariants for array programs. 
QUIC3 [25], FreqHorn [9,19] infer universally quantified invariants over arrays for 
Constrained Horn Clauses (CHCs). Template-based techniques [8,23,47] search 
for inductive quantified invariants by instantiating parameters of a fixed set 
of templates. We generate relational invariants, which are often easier to infer 
compared to inductive quantified invariants for each loop. 


Abstraction-Based Techniques. Counterexample-guided abstraction refinement 
using prophecy variables for programs with arrays is proposed in [36]. VERI- 
ABs [1] uses a portfolio of techniques, specifically to identify loops that can 
be soundly abstracted by a bounded number of iterations. VAPHOR [38] trans- 
forms array programs to array-free Horn formulas to track bounded number of 
array cells. BOOSTER [3] combines lazy abstraction based interpolation [2] and 
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acceleration [10,28] for array programs. Abstractions in [16, 18, 22,26, 29,33, 37] 
implicitly or explicitly partition the range array indices to infer and prove facts 
on array segments. In contrast, our method does not rely on abstractions. 


7 Conclusion 


We presented a novel verification technique that combines generation of dif- 
ference invariants and inductive reasoning. Difference invariants relate corre- 
sponding variables and arrays from the two versions of a program and are often 
easy to infer and prove. We have instantiated these techniques in our proto- 
type Dirry. Experiments shows that DIFFY out-performs the tools that won 
the Arrays sub-category in SV-COMP 2019, 2020 and 2021. Although we have 
focused on universal and quantifier-free properties in this paper, the technique 
applies to some classes of existential properties as well. The interested reader 
is referred to [13] for more details. Investigations in using synthesis techniques 
for automatic generation of difference invariants to verify properties of array 
manipulating programs is a part of future work. 
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